tabstack-extractor
Extract structured data from websites using Tabstack API. Use when you need to scrape job listings, news articles, product pages, or any structured web content. Provides JSON schema-based extraction and clean markdown conversion. Requires TABSTACK_API_KEY environment variable.
Install
mkdir -p .claude/skills/tabstack-extractor && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7891" && unzip -o skill.zip -d .claude/skills/tabstack-extractor && rm skill.zipInstalls to .claude/skills/tabstack-extractor
About this skill
Tabstack Extractor
Overview
This skill enables structured data extraction from websites using the Tabstack API. It's ideal for web scraping tasks where you need consistent, schema-based data extraction from job boards, news sites, product pages, or any structured content.
Quick Start
1. Install Babashka (if needed)
# Option A: From GitHub (recommended for sharing)
curl -s https://raw.githubusercontent.com/babashka/babashka/master/install | bash
# Option B: From Nix
nix-shell -p babashka
# Option C: From Homebrew
brew install borkdude/brew/babashka
2. Set up API Key
Option A: Environment variable (recommended)
export TABSTACK_API_KEY="your_api_key_here"
Option B: Configuration file
mkdir -p ~/.config/tabstack
echo '{:api-key "your_api_key_here"}' > ~/.config/tabstack/config.edn
Get an API key: Sign up at Tabstack Console
3. Test Connection
bb scripts/tabstack.clj test
4. Extract Markdown (Simple)
bb scripts/tabstack.clj markdown "https://example.com"
5. Extract JSON (Start Simple)
# Start with simple schema (fast, reliable)
bb scripts/tabstack.clj json "https://example.com" references/simple_article.json
# Try more complex schemas (may be slower)
bb scripts/tabstack.clj json "https://news.site" references/news_schema.json
6. Advanced Features
# Extract with retry logic (3 retries, 1s delay)
bb scripts/tabstack.clj json-retry "https://example.com" references/simple_article.json
# Extract with caching (24-hour cache)
bb scripts/tabstack.clj json-cache "https://example.com" references/simple_article.json
# Batch extract from URLs file
echo "https://example.com" > urls.txt
echo "https://example.org" >> urls.txt
bb scripts/tabstack.clj batch urls.txt references/simple_article.json
Core Capabilities
1. Markdown Extraction
Extract clean, readable markdown from any webpage. Useful for content analysis, summarization, or archiving.
When to use: When you need the textual content of a page without the HTML clutter.
Example use cases:
- Extract article content for summarization
- Archive webpage content
- Analyze blog post content
2. JSON Schema Extraction
Extract structured data using JSON schemas. Define exactly what data you want and get it in a consistent format.
When to use: When scraping job listings, product pages, news articles, or any structured data.
Example use cases:
- Scrape job listings from BuiltIn/LinkedIn
- Extract product details from e-commerce sites
- Gather news articles with consistent metadata
3. Schema Templates
Pre-built schemas for common scraping tasks. See references/ directory for templates.
Available schemas:
- Job listing schema (see
references/job_schema.json) - News article schema
- Product page schema
- Contact information schema
Workflow: Job Scraping Example
Follow this workflow to scrape job listings:
- Identify target sites - BuiltIn, LinkedIn, company career pages
- Choose or create schema - Use
references/job_schema.jsonor customize - Test extraction - Run a single page to verify schema works
- Scale up - Process multiple URLs
- Store results - Save to database or file
Example job schema:
{
"type": "object",
"properties": {
"title": {"type": "string"},
"company": {"type": "string"},
"location": {"type": "string"},
"description": {"type": "string"},
"salary": {"type": "string"},
"apply_url": {"type": "string"},
"posted_date": {"type": "string"},
"requirements": {"type": "array", "items": {"type": "string"}}
}
}
Integration with Other Skills
Combine with Web Search
- Use
web_searchto find relevant URLs - Use Tabstack to extract structured data from those URLs
- Store results in Datalevin (future skill)
Combine with Browser Automation
- Use
browsertool to navigate complex sites - Extract page URLs
- Use Tabstack for structured extraction
Error Handling
Common issues and solutions:
- Authentication failed - Check
TABSTACK_API_KEYenvironment variable - Invalid URL - Ensure URL is accessible and correct
- Schema mismatch - Adjust schema to match page structure
- Rate limiting - Add delays between requests
Resources
scripts/
tabstack.clj- Main API wrapper in Babashka (recommended, has retry logic, caching, batch processing)tabstack_curl.sh- Bash/curl fallback (simple, no dependencies)tabstack_api.py- Python API wrapper (requires requests module)
references/
job_schema.json- Template schema for job listingsapi_reference.md- Tabstack API documentation
Best Practices
- Start small - Test with single pages before scaling
- Respect robots.txt - Check site scraping policies
- Add delays - Avoid overwhelming target sites
- Validate schemas - Test schemas on sample pages
- Handle errors gracefully - Implement retry logic for failed requests
Teaching Focus: How to Create Schemas
This skill is designed to teach agents how to use Tabstack API effectively. The key is learning to create appropriate JSON schemas for different websites.
Learning Path
- Start Simple - Use
references/simple_article.json(4 basic fields) - Test Extensively - Try schemas on multiple page types
- Iterate - Add fields based on what the page actually contains
- Optimize - Remove unnecessary fields for speed
See Schema Creation Guide for detailed instructions and examples.
Common Mistakes to Avoid
- Over-complex schemas - Start with 2-3 fields, not 20
- Missing fields - Don't require fields that don't exist on the page
- No testing - Always test with example.com first, then target sites
- Ignoring timeouts - Complex schemas take longer (45s timeout)
Babashka Advantages
Using Babashka for this skill provides:
- Single binary - Easy to share/install (GitHub releases, brew, nix)
- Fast startup - No JVM warmup, ~50ms startup time
- Built-in HTTP client - No external dependencies
- Clojure syntax - Familiar to you (Wes), expressive
- Retry logic & caching - Built into the skill
- Batch processing - Parallel extraction for multiple URLs
Example User Requests
For this skill to trigger:
- "Scrape job listings from Docker careers page"
- "Extract the main content from this article"
- "Get structured product data from this e-commerce page"
- "Pull all the news articles from this site"
- "Extract contact information from this company page"
- "Batch extract job listings from these 20 URLs"
- "Get cached results for this page (avoid API calls)"
More by openclaw
View all skills by openclaw →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversIntegrate FireCrawl for advanced web scraping to extract clean, structured data from complex websites—fast, scalable, an
Access real-time web scraping with Bright Data. Scrape any website and extract structured data easily using advanced web
Dumpling AI offers advanced web scraping tools, acting as a web scraper to extract structured data from websites and doc
Scrapezy lets you easily extract structured data and scrape any website for web scraping, content aggregation, and autom
LSD Web Data Extraction lets you scrape any website with ease. Perform web page scraping and manipulate data using commu
XPath enables Claude to execute xpath queries on XML and HTML, supporting web scraping for structured data extraction wi
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.