tabstack-extractor

Name: tabstack-extractor
Author: openclaw

3views

0installs

Extract structured data from websites using Tabstack API. Use when you need to scrape job listings, news articles, product pages, or any structured web content. Provides JSON schema-based extraction and clean markdown conversion. Requires TABSTACK_API_KEY environment variable.

Install

mkdir -p .claude/skills/tabstack-extractor && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7891" && unzip -o skill.zip -d .claude/skills/tabstack-extractor && rm skill.zip

Installs to .claude/skills/tabstack-extractor

About this skill

Tabstack Extractor

Overview

This skill enables structured data extraction from websites using the Tabstack API. It's ideal for web scraping tasks where you need consistent, schema-based data extraction from job boards, news sites, product pages, or any structured content.

Quick Start

1. Install Babashka (if needed)

# Option A: From GitHub (recommended for sharing)
curl -s https://raw.githubusercontent.com/babashka/babashka/master/install | bash

# Option B: From Nix
nix-shell -p babashka

# Option C: From Homebrew
brew install borkdude/brew/babashka

2. Set up API Key

Option A: Environment variable (recommended)

export TABSTACK_API_KEY="your_api_key_here"

Option B: Configuration file

mkdir -p ~/.config/tabstack
echo '{:api-key "your_api_key_here"}' > ~/.config/tabstack/config.edn

Get an API key: Sign up at Tabstack Console

3. Test Connection

bb scripts/tabstack.clj test

4. Extract Markdown (Simple)

bb scripts/tabstack.clj markdown "https://example.com"

5. Extract JSON (Start Simple)

# Start with simple schema (fast, reliable)
bb scripts/tabstack.clj json "https://example.com" references/simple_article.json

# Try more complex schemas (may be slower)
bb scripts/tabstack.clj json "https://news.site" references/news_schema.json

6. Advanced Features

# Extract with retry logic (3 retries, 1s delay)
bb scripts/tabstack.clj json-retry "https://example.com" references/simple_article.json

# Extract with caching (24-hour cache)
bb scripts/tabstack.clj json-cache "https://example.com" references/simple_article.json

# Batch extract from URLs file
echo "https://example.com" > urls.txt
echo "https://example.org" >> urls.txt
bb scripts/tabstack.clj batch urls.txt references/simple_article.json

Core Capabilities

1. Markdown Extraction

Extract clean, readable markdown from any webpage. Useful for content analysis, summarization, or archiving.

When to use: When you need the textual content of a page without the HTML clutter.

Example use cases:

Extract article content for summarization
Archive webpage content
Analyze blog post content

2. JSON Schema Extraction

Extract structured data using JSON schemas. Define exactly what data you want and get it in a consistent format.

When to use: When scraping job listings, product pages, news articles, or any structured data.

Example use cases:

Scrape job listings from BuiltIn/LinkedIn
Extract product details from e-commerce sites
Gather news articles with consistent metadata

3. Schema Templates

Pre-built schemas for common scraping tasks. See references/ directory for templates.

Available schemas:

Job listing schema (see references/job_schema.json)
News article schema
Product page schema
Contact information schema

Workflow: Job Scraping Example

Follow this workflow to scrape job listings:

Identify target sites - BuiltIn, LinkedIn, company career pages
Choose or create schema - Use references/job_schema.json or customize
Test extraction - Run a single page to verify schema works
Scale up - Process multiple URLs
Store results - Save to database or file

Example job schema:

{
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "company": {"type": "string"},
    "location": {"type": "string"},
    "description": {"type": "string"},
    "salary": {"type": "string"},
    "apply_url": {"type": "string"},
    "posted_date": {"type": "string"},
    "requirements": {"type": "array", "items": {"type": "string"}}
  }
}

Integration with Other Skills

Combine with Web Search

Use web_search to find relevant URLs
Use Tabstack to extract structured data from those URLs
Store results in Datalevin (future skill)

Combine with Browser Automation

Use browser tool to navigate complex sites
Extract page URLs
Use Tabstack for structured extraction

Error Handling

Common issues and solutions:

Authentication failed - Check TABSTACK_API_KEY environment variable
Invalid URL - Ensure URL is accessible and correct
Schema mismatch - Adjust schema to match page structure
Rate limiting - Add delays between requests

Resources

scripts/

tabstack.clj - Main API wrapper in Babashka (recommended, has retry logic, caching, batch processing)
tabstack_curl.sh - Bash/curl fallback (simple, no dependencies)
tabstack_api.py - Python API wrapper (requires requests module)

references/

job_schema.json - Template schema for job listings
api_reference.md - Tabstack API documentation

Best Practices

Start small - Test with single pages before scaling
Respect robots.txt - Check site scraping policies
Add delays - Avoid overwhelming target sites
Validate schemas - Test schemas on sample pages
Handle errors gracefully - Implement retry logic for failed requests

Teaching Focus: How to Create Schemas

This skill is designed to teach agents how to use Tabstack API effectively. The key is learning to create appropriate JSON schemas for different websites.

Learning Path

Start Simple - Use references/simple_article.json (4 basic fields)
Test Extensively - Try schemas on multiple page types
Iterate - Add fields based on what the page actually contains
Optimize - Remove unnecessary fields for speed

See Schema Creation Guide for detailed instructions and examples.

Common Mistakes to Avoid

Over-complex schemas - Start with 2-3 fields, not 20
Missing fields - Don't require fields that don't exist on the page
No testing - Always test with example.com first, then target sites
Ignoring timeouts - Complex schemas take longer (45s timeout)

Babashka Advantages

Using Babashka for this skill provides:

Single binary - Easy to share/install (GitHub releases, brew, nix)
Fast startup - No JVM warmup, ~50ms startup time
Built-in HTTP client - No external dependencies
Clojure syntax - Familiar to you (Wes), expressive
Retry logic & caching - Built into the skill
Batch processing - Parallel extraction for multiple URLs

Example User Requests

For this skill to trigger:

"Scrape job listings from Docker careers page"
"Extract the main content from this article"
"Get structured product data from this e-commerce page"
"Pull all the news articles from this site"
"Extract contact information from this company page"
"Batch extract job listings from these 20 URLs"
"Get cached results for this page (avoid API calls)"

More by openclaw

View all skills by openclaw →

fivem

openclaw

Fix, create, or validate FiveM server resources for QBCore/ESX (config.lua, fxmanifest.lua, items, housing/furniture, scripts, MLOs). Use when asked to debug resource errors, convert ESX↔QB, update fxmanifest versions, add items, or source scripts from GitHub. Also use for SSH key generation for SFTP access.

582395

a-stock-analysis

openclaw

A股实时行情与分时量能分析。获取沪深股票实时价格、涨跌、成交量，分析分时量能分布（早盘/尾盘放量）、主力动向（抢筹/出货信号）、涨停封单。支持持仓管理和盈亏分析。Use when: (1) 查询A股实时行情, (2) 分析主力资金动向, (3) 查看分时成交量分布, (4) 管理股票持仓, (5) 分析持仓盈亏。

802305

research-paper-writer

openclaw

Creates formal academic research papers following IEEE/ACM formatting standards with proper structure, citations, and scholarly writing style. Use when the user asks to write a research paper, academic paper, or conference paper on any topic.

85173

keyword-research

openclaw

Discovers high-value keywords with search intent analysis, difficulty assessment, and content opportunity mapping. Essential for starting any SEO or GEO content strategy.

465118

html-to-ppt

openclaw

Convert HTML/Markdown to PowerPoint presentations using Marp

36494

weread

openclaw

WeChat Reading (微信读书) CLI tool for fetching notes and highlights. Use when: (1) user asks about weread/微信读书 notes or highlights, (2) fetching today's or recent reading notes, (3) exporting book highlights, (4) managing reading bookshelf, (5) any task involving reading notes from WeChat Reading.

12186

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

3,2282,764

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

4,2691,835

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,2241,672

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,3651,519

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,6721,284

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

2,0781,002

Related MCP Servers

Browse all servers

FireCrawl

Integrate FireCrawl for advanced web scraping to extract clean, structured data from complex websites—fast, scalable, and reliable.

5,6940 tools

Bright Data

Access real-time web scraping with Bright Data. Scrape any website and extract structured data easily using advanced web page scraping tools.

2,1680 tools

Dumpling AI

Dumpling AI offers advanced web scraping tools, acting as a web scraper to extract structured data from websites and documents efficiently.

290 tools

Scrapezy

Scrapezy lets you easily extract structured data and scrape any website for web scraping, content aggregation, and automated research tasks.

130 tools

LSD Web Data Extraction

LSD Web Data Extraction lets you scrape any website with ease. Perform web page scraping and manipulate data using community patterns—no complex code needed.

30 tools

XPath

XPath enables Claude to execute xpath queries on XML and HTML, supporting web scraping for structured data extraction with xpath.

2 tools

Install

mkdir -p .claude/skills/tabstack-extractor && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7891" && unzip -o skill.zip -d .claude/skills/tabstack-extractor && rm skill.zip

Installs to .claude/skills/tabstack-extractor

Stats

Views

Installs

Author

openclaw

7 skills published

Links

Source Code

tabstack-extractor

Install

About this skill

Tabstack Extractor

Overview

Quick Start

1. Install Babashka (if needed)

2. Set up API Key

3. Test Connection

4. Extract Markdown (Simple)

5. Extract JSON (Start Simple)

6. Advanced Features

Core Capabilities

1. Markdown Extraction

2. JSON Schema Extraction

3. Schema Templates

Workflow: Job Scraping Example

Integration with Other Skills

Combine with Web Search

Combine with Browser Automation

Error Handling

Resources

scripts/

references/

Best Practices

Teaching Focus: How to Create Schemas

Learning Path

Common Mistakes to Avoid

Babashka Advantages

Example User Requests

More by openclaw

fivem

a-stock-analysis

research-paper-writer

keyword-research

html-to-ppt

weread

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers