83
1
Source

Web scraping and automation platform with pre-built Actors for common tasks

Install

mkdir -p .claude/skills/apify && curl -L -o skill.zip "https://mcp.directory/api/skills/download/484" && unzip -o skill.zip -d .claude/skills/apify && rm skill.zip

Installs to .claude/skills/apify

About this skill

Apify

Web scraping and automation platform. Run pre-built Actors (scrapers) or create your own. Access thousands of ready-to-use scrapers for popular websites.

Official docs: https://docs.apify.com/api/v2


When to Use

Use this skill when you need to:

  • Scrape data from websites (Amazon, Google, LinkedIn, Twitter, etc.)
  • Run pre-built web scrapers without coding
  • Extract structured data from any website
  • Automate web tasks at scale
  • Store and retrieve scraped data

Prerequisites

  1. Create an account at https://apify.com/
  2. Get your API token from https://console.apify.com/account#/integrations

Set environment variable:

export APIFY_API_TOKEN="apify_api_xxxxxxxxxxxxxxxxxxxxxxxx"

Important: When using $VAR in a command that pipes to another command, wrap the command containing $VAR in bash -c '...'. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.

bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

How to Use

1. Run an Actor (Async)

Start an Actor run asynchronously:

Write to /tmp/apify_request.json:

{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10,
  "pageFunction": "async function pageFunction(context) { const { request, log, jQuery } = context; const $ = jQuery; const title = $(\"title\").text(); return { url: request.url, title }; }"
}

Then run:

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

Response contains id (run ID) and defaultDatasetId for fetching results.

2. Run Actor Synchronously

Wait for completion and get results directly (max 5 min):

Write to /tmp/apify_request.json:

{
  "startUrls": [{"url": "https://news.ycombinator.com"}],
  "maxPagesPerCrawl": 1,
  "pageFunction": "async function pageFunction(context) { const { request, log, jQuery } = context; const $ = jQuery; const title = $(\"title\").text(); return { url: request.url, title }; }"
}

Then run:

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

3. Check Run Status

⚠️ Important: The {runId} below is a placeholder - replace it with the actual run ID from your async run response (found in .data.id). See the complete workflow example below.

Poll the run status:

# Replace {runId} with actual ID like "HG7ML7M8z78YcAPEB"
bash -c 'curl -s "https://api.apify.com/v2/actor-runs/{runId}" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq -r '.data.status'

Complete workflow example (capture run ID and check status):

Write to /tmp/apify_request.json:

{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10
}

Then run:

# Step 1: Start an async run and capture the run ID
RUN_ID=$(bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json' | jq -r '.data.id')

# Step 2: Check the run status
bash -c "curl -s \"https://api.apify.com/v2/actor-runs/${RUN_ID}\" --header \"Authorization: Bearer \${APIFY_API_TOKEN}\"" | jq '.data.status'

Statuses: READY, RUNNING, SUCCEEDED, FAILED, ABORTED, TIMED-OUT

4. Get Dataset Items

⚠️ Important: The {datasetId} below is a placeholder - do not use it literally! You must replace it with the actual dataset ID from your run response (found in .data.defaultDatasetId). See the complete workflow example below for how to capture and use the real ID.

Fetch results from a completed run:

# Replace {datasetId} with actual ID like "WkzbQMuFYuamGv3YF"
bash -c 'curl -s "https://api.apify.com/v2/datasets/{datasetId}/items" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'

Complete workflow example (run async, wait, and fetch results):

Write to /tmp/apify_request.json:

{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10
}

Then run:

# Step 1: Start async run and capture IDs
RESPONSE=$(bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json')

RUN_ID=$(echo "$RESPONSE" | jq -r '.data.id')
DATASET_ID=$(echo "$RESPONSE" | jq -r '.data.defaultDatasetId')

# Step 2: Wait for completion (poll status)
while true; do
  STATUS=$(bash -c "curl -s \"https://api.apify.com/v2/actor-runs/${RUN_ID}\" --header \"Authorization: Bearer \${APIFY_API_TOKEN}\"" | jq -r '.data.status')
  echo "Status: $STATUS"
  [[ "$STATUS" == "SUCCEEDED" ]] && break
  [[ "$STATUS" == "FAILED" || "$STATUS" == "ABORTED" ]] && exit 1
  sleep 5
done

# Step 3: Fetch the dataset items
bash -c "curl -s \"https://api.apify.com/v2/datasets/${DATASET_ID}/items\" --header \"Authorization: Bearer \${APIFY_API_TOKEN}\""

With pagination:

# Replace {datasetId} with actual ID
bash -c 'curl -s "https://api.apify.com/v2/datasets/{datasetId}/items?limit=100&offset=0" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'

5. Popular Actors

Google Search Scraper

Write to /tmp/apify_request.json:

{
  "queries": "web scraping tools",
  "maxPagesPerQuery": 1,
  "resultsPerPage": 10
}

Then run:

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?timeout=120" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

Website Content Crawler

Write to /tmp/apify_request.json:

{
  "startUrls": [{"url": "https://docs.example.com"}],
  "maxCrawlPages": 10,
  "crawlerType": "cheerio"
}

Then run:

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~website-content-crawler/run-sync-get-dataset-items?timeout=300" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

Instagram Scraper

Write to /tmp/apify_request.json:

{
  "directUrls": ["https://www.instagram.com/apaborotnikov/"],
  "resultsType": "posts",
  "resultsLimit": 10
}

Then run:

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~instagram-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

Amazon Product Scraper

Write to /tmp/apify_request.json:

{
  "categoryOrProductUrls": [{"url": "https://www.amazon.com/dp/B0BSHF7WHW"}],
  "maxItemsPerStartUrl": 1
}

Then run:

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/junglee~amazon-crawler/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

6. List Your Runs

Get recent Actor runs:

bash -c 'curl -s "https://api.apify.com/v2/actor-runs?limit=10&desc=true" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq '.data.items[] | {id, actId, status, startedAt}'

7. Abort a Run

⚠️ Important: The {runId} below is a placeholder - replace it with the actual run ID. See the complete workflow example below.

Stop a running Actor:

# Replace {runId} with actual ID like "HG7ML7M8z78YcAPEB"
bash -c 'curl -s -X POST "https://api.apify.com/v2/actor-runs/{runId}/abort" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'

Complete workflow example (start a run and abort it):

Write to /tmp/apify_request.json:

{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 100
}

Then run:

# Step 1: Start an async run and capture the run ID
RUN_ID=$(bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json' | jq -r '.data.id')

echo "Started run: $RUN_ID"

# Step 2: Abort the run
bash -c "curl -s -X POST \"https://api.apify.com/v2/actor-runs/${RUN_ID}/abort\" --header \"Authorization: Bearer \${APIFY_API_TOKEN}\""

8. List Available Actors

Browse public Actors:

bash -c 'curl -s "https://api.apify.com/v2/store?limit=20&category=ECOMMERCE" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq '.data.items[] | {name, username, title}'

Popular Actors Reference

Actor IDDescription
apify/web-scraperGeneral web scraper
apify/website-content-crawlerCrawl entire websites
apify/google-search-scraperGoogle search results
apify/instagram-scraperInstagram posts/profiles
junglee/amazon-crawlerAmazon products
apify/twitter-scraperTwitter/X posts
apify/youtube-scraperYouTube videos
apify/linkedin-scraperLinkedIn profiles
lukaskrivka/google-mapsGoogle Maps places

Find more at: https://apify.com/store


Run Options

ParameterTypeDescription
timeoutnumberRun timeout in seconds
memorynumberMemory in MB (128, 256, 512, 1024, 2048, 4096)
maxItemsnumberMax items to return (for sync endpoints)
buildstringActor build tag (default: "latest")
waitForFinishnumberWait time in seconds (for async runs)

Response Format

Run object:

{
  "data": {
  "id": "HG7ML7M8z78YcAPEB",
  "actId": "HDSasDasz78YcAPEB",
  "status": "SUCCEEDED",
  "startedAt": "2024-01-01T00:00:00.000Z",
  "finishedAt": "2024-01-01T00:01:00.000Z",
  "defaultDatasetId": "WkzbQMuFYuamGv3YF",
  "defaultKeyValueStoreId": "tbhFDFDh78YcAPEB"
  }
}

Guidelines

  1. Sync vs Async: Use run-sync-get-dataset-items for quick tasks (<5 min), async for longer jobs
  2. Rate Limits: 250,000 requests/min globally, 400/sec per resource
  3. Memory: Higher memory = faster execution but more credits
  4. Timeouts: Default varies by Actor; set explicit timeout for sync calls
  5. Pagination: Use limit and offset for large datasets
  6. Actor Input: Each Actor has different input schema - check Actor's page for details
  7. Credits: Check usage at https://console.apify.com/billing

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

289790

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

213415

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

213296

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

222234

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

173201

rust-coding-skill

UtakataKyosui

Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.

166173

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.