Updated June 2026Cookbook16 min read

Jina AI skill for Claude: 10 URL-to-Markdown recipes

Ten real workflows on Jina’s Reader and Search APIs — docs-to-context, web search grounding, RAG corpus ingestion, article briefs, changelog monitoring, search-then-read research, fact-checking, SPA reading, newsletter digests, and a raw-curl integration — each as one Claude prompt with the exact output it produces.

Already know what skills are? Skip to the cookbook. First time? Read the explainer then come back. Need the install? It’s on the /skills/jina-ai page.

Editorial illustration: a browser-window glyph and a markdown-document glyph connected by a luminous teal flow arc, with a small magnifying-glass accent, on a midnight navy background.
On this page · 21 sections
  1. What this skill does
  2. The cookbook
  3. Install + README
  4. Watch it used
  5. 01 · Turn any docs page into Claude context
  6. 02 · Ground an answer with live web search
  7. 03 · Build a RAG corpus from a list of URLs
  8. 04 · Summarize a long article into a decision brief
  9. 05 · Monitor a competitor's changelog or pricing page
  10. 06 · Search-then-read research pipeline
  11. 07 · Fact-check a claim before you publish it
  12. 08 · Read JavaScript-heavy pages that curl can't
  13. 09 · Digest every link in a newsletter or awesome-list
  14. 10 · Call Reader directly when the script is overkill
  15. Community signal
  16. The contrarian take
  17. Real projects on Jina Reader
  18. Gotchas
  19. Pairs well with
  20. FAQ
  21. Sources

What this skill actually does

Sixty seconds of context before the cookbook — what the jina-ai skill is, what Claude returns when you invoke it, and the one thing it does NOT do for you.

What this skill actually does

Use Jina AI APIs for converting URLs to LLM-friendly Markdown (Reader) and searching the web (Search).

closedloop-technologies, the skill author · /skills/jina-ai

What Claude returns

Claude runs `python3 scripts/jina_tools.py read "<url>"` to fetch any page through r.jina.ai and `python3 scripts/jina_tools.py search "<query>"` to query the web through s.jina.ai. Both return JSON with `title`, `url`, and `content_markdown` — the page body already converted to Markdown. The script sends `X-Return-Format: json` and adds a Bearer header automatically when `JINA_API_KEY` is set in `.env`. Setup is `pip install requests python-dotenv`.

What it does NOT do

It does not bypass logins or hard paywalls, and it does not crawl — it reads URLs you give it, one at a time.

How you trigger it

read this URL and answer from its contentsearch the web for X and cite the sourcesconvert these docs pages to markdown for our RAG corpus

Cost when idle

~100 tokens at idle (name + description in the system prompt). The script runs only when triggered.

The skill wraps two Jina AI endpoints that have become the default plumbing for LLM web access. r.jina.ai is a URL prefix: put it before any address and the page comes back as clean Markdown — nav, ads, and cookie banners stripped, content intact. s.jina.ai does the same for web search, returning the top results with their full page content already converted. The whole point is token economics: the same page costs roughly 8x fewer tokens as Markdown than as HTML, which is the difference between fitting nine sources in context and fitting one.

The cookbook

Each entry below is a workflow you could run today. They go in the order I’d teach them — single read, single search, then the composed pipelines (corpus ingestion, search-then-read, fact-checking) that make the skill worth installing over an ad-hoc curl. Every entry pairs with one or two skills or MCP servers from mcp.directory; for the broader search landscape, see the best web-search MCP servers.

Install + README

If the skill isn’t on your machine yet, here’s the one-liner. The full install panel (Codex, Copilot, Antigravity variants) is on the skill page. One real setup step after install: pip install requests python-dotenv, and optionally a JINA_API_KEY in .env.

One-line install · by closedloop-technologies

Open skill page

Install

mkdir -p .claude/skills/jina-ai && curl -L -o skill.zip "https://mcp.directory/api/skills/download/398" && unzip -o skill.zip -d .claude/skills/jina-ai && rm skill.zip

Installs to .claude/skills/jina-ai

Watch it used

A Reader API walkthrough is the right primer — you want the input/output shape anchored before reading ten prompts that exploit it. This one frames Reader exactly where the cookbook does: as the ingestion step for agents and RAG.

01

Turn any docs page into Claude context

Feed Claude a live documentation page as clean Markdown instead of pasting HTML soup or screenshots. One read call, then ask questions against the actual text.

ForAnyone whose question starts with 'the docs say...' — library upgrades, API migrations, config debugging.

The prompt

Use the jina-ai skill. Read https://docs.python.org/3/library/asyncio-task.html with `python3 scripts/jina_tools.py read` and answer from the fetched content only: what is the difference between asyncio.create_task and asyncio.ensure_future, and which one does the page recommend for new code? Quote the exact sentence you based the answer on.

What slides.md looks like

$ python3 scripts/jina_tools.py read "https://docs.python.org/3/library/asyncio-task.html"
{
  "title": "Coroutines and Tasks — Python 3 documentation",
  "url": "https://docs.python.org/3/library/asyncio-task.html",
  "content_markdown": "## Creating Tasks\n\nasyncio.create_task(coro, *,
    name=None, context=None)\nWrap the coro coroutine into a Task and
    schedule its execution..."
}

One-line tweak

Swap the URL for your internal docs site — Reader renders any publicly reachable page, including ones behind JavaScript routing.

02

Ground an answer with live web search

Stop Claude from answering a time-sensitive question from training data. One s.jina.ai search returns the top results as full Markdown, so the answer cites pages, not memory.

ForQuestions where 'as of my knowledge cutoff' is an unacceptable answer — releases, deprecations, breaking changes.

The prompt

Use the jina-ai skill. Run `python3 scripts/jina_tools.py search "Node.js LTS schedule current active version"` and answer strictly from the results: which Node major versions are in Active LTS and Maintenance right now? List each claim with the source URL it came from. If the results disagree, say so instead of picking one.

What slides.md looks like

$ python3 scripts/jina_tools.py search "Node.js LTS schedule current active version"
[
  {
    "title": "Node.js — Releases",
    "url": "https://nodejs.org/en/about/previous-releases",
    "content_markdown": "## Release schedule\n\nMajor Node.js versions enter
      Current release status for six months..."
  },
  { "title": "...", "url": "...", "content_markdown": "..." }
]

One-line tweak

Append 'site:github.com' style operators to the query string — s.jina.ai passes the query to a real SERP, so search operators work.

03

Build a RAG corpus from a list of URLs

Convert 50 documentation pages into individual Markdown files ready for chunking and embedding. The skill loops Reader over a URL list and writes one .md per page.

ForAnyone bootstrapping a RAG index over docs they don't own — no crawler config, no HTML parsing.

The prompt

Use the jina-ai skill. I have urls.txt with one docs URL per line (about 50 lines). Write and run a bash loop that calls `python3 scripts/jina_tools.py read` on each URL, extracts content_markdown with jq, and saves it to corpus/<slugified-hostname-path>.md. Sleep 4 seconds between calls so we stay under the keyless rate limit. Print a final count of files written and flag any URL that returned empty content.

What slides.md looks like

while IFS= read -r url; do
  slug=$(printf '%s' "$url" | sed -E 's|https?://||; s|[^A-Za-z0-9]+|-|g')
  python3 scripts/jina_tools.py read "$url" \
    | jq -r '.content_markdown' > "corpus/$slug.md"
  sleep 4
done < urls.txt
ls corpus | wc -l   # 50

One-line tweak

Set JINA_API_KEY in .env first and drop the sleep — the keyed rate limit is an order of magnitude higher than the anonymous one.

04

Summarize a long article into a decision brief

A 6,000-word engineering blog post becomes a half-page brief: claims, evidence, what to do about it. Reader strips nav, ads, and comments so the summary works from signal only.

ForTech leads triaging 'have you seen this post?' links without spending 25 minutes per link.

The prompt

Use the jina-ai skill. Read this article with `python3 scripts/jina_tools.py read <url>`, then produce a brief with exactly four sections: TL;DR (2 sentences), Core claims (bullet list, each with the supporting evidence the author gives), What's opinion vs. measurement (split them), and Action for our team (1-2 sentences, assume we run a Postgres-backed Rails monolith). Do not pad. If the article gives no measurements, say so under evidence.

What slides.md looks like

## TL;DR
The author migrated from sidekiq to pg-backed jobs and cut infra
spend; the win came from deleting Redis, not from job throughput.

## Core claims
- Queue latency stayed under 100ms at their scale (measured, p95 graph)
- Postgres SKIP LOCKED handles their 2k jobs/min (measured)

## Opinion vs. measurement
Opinion: "Redis is operational overhead you don't need."

One-line tweak

Change the final section's context line ('assume we run...') — it's the difference between a generic summary and a brief someone acts on.

05

Monitor a competitor's changelog or pricing page

Snapshot a page as Markdown on a schedule, commit it to git, and let the diff tell you what changed. Reader normalizes the HTML so diffs show content changes, not markup churn.

ForPMs and founders who currently 'check the competitor's site sometimes' and miss the week it mattered.

The prompt

Use the jina-ai skill. Write snapshot.sh: it reads https://example-competitor.com/changelog and /pricing via `python3 scripts/jina_tools.py read`, writes each content_markdown to snapshots/changelog.md and snapshots/pricing.md, then runs git add + git commit -m "snapshot $(date +%F)". Then run it once and show me `git diff HEAD~1 -- snapshots/` if a previous snapshot exists. Summarize any diff in two sentences.

What slides.md looks like

$ git diff HEAD~1 -- snapshots/pricing.md
-| Team    | $49/mo  | 5 seats  |
+| Team    | $59/mo  | 5 seats  |
+| Scale   | $199/mo | 25 seats, SSO |

# Claude: "Team tier price increased and a new Scale tier
# with SSO appeared — their first enterprise motion."

One-line tweak

Run snapshot.sh from cron or CI weekly; the prompt only needs to be re-run when you want the diff interpreted.

06

Search-then-read research pipeline

The two operations composed: search finds candidate sources, read pulls each one in full, and Claude writes the comparison from complete pages instead of search snippets.

ForResearch questions where the answer lives in the third paragraph of page two — snippets alone get it wrong.

The prompt

Use the jina-ai skill. Question: how do Postgres logical replication and pglogical differ for zero-downtime major-version upgrades? First run `python3 scripts/jina_tools.py search` on the question. Pick the 3 most authoritative result URLs (prefer postgresql.org and maintainer blogs), read each with the read command, and write a comparison: capabilities table, failure modes each source warns about, and a recommendation. Cite which source each row came from.

What slides.md looks like

python3 scripts/jina_tools.py search \
  "postgres logical replication vs pglogical major upgrade" > hits.json
jq -r '.[].url' hits.json | head -3 | while IFS= read -r u; do
  python3 scripts/jina_tools.py read "$u" >> sources.jsonl
done
# 3 full pages → one cited comparison table

One-line tweak

Raise 'pick the 3 most authoritative' to 5 for contested topics — disagreement between sources is the finding.

07

Fact-check a claim before you publish it

A claim goes in, a verdict comes out: supported, contradicted, or unverifiable — with the exact passages that decide it. Search grounding makes the check reproducible.

ForAnyone shipping content where a wrong stat costs credibility — blog posts, decks, docs.

The prompt

Use the jina-ai skill. Claim to verify: "SQLite is the most widely deployed database engine in the world." Search for it with `python3 scripts/jina_tools.py search`, read the two most authoritative results in full, and return a verdict block: VERDICT (supported / contradicted / unverifiable), the verbatim passage from each source that supports the verdict, the source URLs, and one sentence on how confident I should be. Do not soften the verdict.

What slides.md looks like

VERDICT: supported
Source 1: https://www.sqlite.org/mostdeployed.html
  "SQLite is likely used more than all other database
   engines combined."
Source 2: independent coverage repeating the claim with
  the same first-party origin.
Confidence: high for "most deployed", noting the primary
source is the SQLite project itself.

One-line tweak

Batch it: paste five claims and ask for five verdict blocks — the per-claim cost is one search plus two reads.

08

Read JavaScript-heavy pages that curl can't

SPAs return an empty <div id="root"> to plain HTTP fetches. Reader renders the page in a headless browser first, so client-side content comes back as Markdown.

ForAnyone who has piped a React app into curl and received 1 KB of nothing.

The prompt

Use the jina-ai skill. This page is a client-rendered SPA and curl returns an empty shell: https://app.example.com/public/changelog. Fetch it with `python3 scripts/jina_tools.py read` instead and confirm you got real content by printing the first three headings from content_markdown. Then summarize the last five releases listed.

What slides.md looks like

$ curl -s https://app.example.com/public/changelog | wc -c
1204    # <div id="root"></div> and a script tag

$ python3 scripts/jina_tools.py read "https://app.example.com/public/changelog" \
    | jq -r '.content_markdown' | grep '^#' | head -3
## v2.41 — Edge caching for media
## v2.40 — SSO domain enforcement
## v2.39 — Bulk export API

One-line tweak

If a page needs a click or login before content appears, Reader can't help — that's headless-browser-with-a-session territory.

09

Digest every link in a newsletter or awesome-list

Twelve links become an annotated bibliography: one verdict line per link, ranked by relevance to what you're building. Claude reads all of them; you read one paragraph.

ForDevelopers with 40 open tabs from three newsletters and zero memory of why.

The prompt

Use the jina-ai skill. Here are the 12 links from this week's newsletter: [paste links]. Read each with `python3 scripts/jina_tools.py read`. Return an annotated bibliography sorted by relevance to building LLM agents in TypeScript: for each link, one line — title, 15-word summary, and a tag from {read-now, skim, skip}. End with the single link I should read first and why, in one sentence.

What slides.md looks like

1. [read-now] Streaming tool calls without buffering —
   pattern for incremental tool-call parsing in TS agents.
2. [skim] Postmortem: agent loop burned $400 — retry logic
   lesson, applies to any provider.
3. [skip] "Top 10 AI tools" — listicle, no code.
...
Read first: #1 — it solves the exact buffering bug in
your current agent runner.

One-line tweak

Change the relevance anchor ('building LLM agents in TypeScript') per run — the same links rank differently for different projects.

10

Call Reader directly when the script is overkill

The skill's pattern without the skill's Python: r.jina.ai is a URL prefix, so any HTTP client is an integration. Useful inside CI, other agents, or a five-line script of your own.

ForAnyone wiring URL-to-Markdown into an existing pipeline that already speaks HTTP.

The prompt

Use the jina-ai skill as reference. Write me a minimal fetch_markdown function for our TypeScript agent that GETs https://r.jina.ai/<target-url> with an Authorization: Bearer header from JINA_API_KEY and X-Return-Format: json, parses {data: {title, url, content}}, and throws a typed error on HTTP 451 (rate-limit/abuse) telling the caller to add an API key. Include the curl equivalent in a comment for debugging.

What slides.md looks like

# the whole integration, as curl:
curl "https://r.jina.ai/https://example.com/post" \
  -H "Authorization: Bearer $JINA_API_KEY" \
  -H "X-Return-Format: json"

{ "data": { "title": "...", "url": "...",
    "content": "## The post, as Markdown..." } }

One-line tweak

Drop X-Return-Format: json to get raw Markdown back — simpler when you're piping straight into a prompt.

Community signal

Three voices from people using Jina Reader for real. The first two are Simon Willison — the most credible independent endorsement this tool category has — and the third is the free-key discovery that fixes most rate-limit complaints.

Jina.ai offer a really neat (currently free) API for this - you add https://r.jina.ai/ on the beginning of any API and it gives you back a Markdown version of the main content of that page, suitable for piping into an LLM.

simonw (Simon Willison, HN) · Hacker News

The clearest third-party endorsement on the table — Willison has been recommending the prefix trick across multiple HN threads since 2024.

Source
Right: the token savings can be enormous here. Use https://tools.simonwillison.net/jina-reader to fetch the https://news.ycombinator.com/ homepage as Markdown and paste it into https://tools.simonwillison.net/claude-token-counter - 1550 tokens. Same thing as HTML: 13367 tokens.

simonw (Simon Willison, HN) · Hacker News

The measured case for Markdown conversion: the same page costs roughly 8x fewer tokens as Markdown than as raw HTML. This is the economics underneath every use case below.

Source
TIL Jina Reader API has (low rate limit) API key free option. https://jina.ai/reader/

flakiness (HN) · Hacker News

The free-key discovery moment — the keyless tier is throttled hard, and a free key changes the experience.

Source

The contrarian take

Not everyone has a smooth ride. The most useful critique comes from djoldman (HN), who hit the anonymous tier’s abuse wall mid-session:

jina is getting cranky: ... “SecurityCompromiseError: Your request is categorized as abuse. Please don't abuse our service. If you are sure you are not abusing, please authenticate yourself with an API key.”

djoldman (HN) · Hacker News

From the HN thread where simonw's Jina-backed tool started erroring for other users.

Source

Fair. The keyless tier is throttled hard, and its abuse heuristics throw HTTP 451 on traffic that looks bursty — which a 50-URL ingestion loop absolutely does. The error message contains its own fix: authenticate. A free key raises the rate limit by an order of magnitude, and the skill’s setup writes it to .env in one step. If your workload can’t tolerate a third-party throttle at all, the reader is open source and self-hostable. My take: free tiers that fail loudly with an actionable error are fine; plan for the key from day one.

One more alternative worth naming: there is a Jina AI MCP server (/servers/jina-ai) exposing the same web reading, search, and fact-checking as MCP tools. The trade-off is the usual skill-vs-MCP one: the skill costs ~100 idle tokens and runs a local script; the MCP server’s tool schemas load every turn but work from any MCP client — Cursor, Claude Desktop, custom agents. Inside Claude Code, the skill is the cheaper composition. Outside it, the server is the only option.

Real projects on Jina Reader

Concrete examples from the wild. None used this Claude skill specifically — they’re here so you have a target shape in mind. The recurring theme is the same one as the cookbook: Reader as the ingestion layer, something else doing the thinking.

Gotchas (the four that bite)

Sourced from the HN threads above and the skill’s own script. You’ll hit at least one of these in your first week.

HTTP 451 SecurityCompromiseError on burst traffic

The anonymous tier categorizes loops and bursts as abuse and refuses with 451. Use case 3 sleeps between calls for exactly this reason. The durable fix is a free JINA_API_KEY in .env — the skill's setup script writes it and the Python adds the Bearer header automatically.

Search returns full pages, and full pages cost tokens

s.jina.ai comes back with the top results' complete content as Markdown — five pages of it. That's the feature, but it means one search can be tens of thousands of tokens. Ask Claude to select before it reads everything into the conversation (use case 6 picks 3 of the hits, deliberately).

The script expects JSON; raw mode breaks it

jina_tools.py sends X-Return-Format: json and parses {data: {title, url, content}}. If you call r.jina.ai yourself without that header you get a raw Markdown stream instead — fine for piping into a prompt, wrong for anything that jq's the output. Use case 10 shows both modes.

Logins, hard paywalls, and interactions are out of scope

Reader renders what an anonymous visitor sees — that covers SPAs and JS-rendered content (use case 8), not session-gated content. If the page needs a click, a cookie, or credentials, hand off to a Playwright-class tool and come back to Jina for the conversion step.

Pairs well with

Curated to match the cookbook’s actual integrations: the research skills that consume what Jina fetches (deep-research, gpt-researcher, web-research), the scraping alternatives for when reading isn’t enough (firecrawl-scraper, crawl4ai), and the search and storage MCP servers the longer use cases (3, 6, 7) lean on. If you’re choosing between scraping stacks, read the Crawlee vs Apify vs Firecrawl vs Spider vs ScrapeGraph comparison first.

Two posts that compose well with this cookbook: the Perplexity skill guide covers the search-plus-synthesis alternative when you’d rather not orchestrate read calls yourself, and the best web-search MCP servers for 2026 maps the whole search-tool landscape the pairings above come from.

Frequently asked questions

What does the Jina AI skill for Claude actually do?

Two operations. `read` converts any URL into LLM-friendly Markdown via Jina's r.jina.ai Reader API; `search` queries the web via s.jina.ai and returns the top results with their full content as Markdown. Both come back as JSON with title, url, and content_markdown, which Claude then reasons over. It's a ~100-line Python script, not a service — the heavy lifting happens on Jina's API.

Is Jina Reader free? Do I need an API key?

Reader works without any API key at a low anonymous rate limit, and the skill runs fine that way for occasional reads. A free API key from jina.ai raises the rate limit substantially and comes with a free token grant. For anything batchy (the RAG corpus recipe, the newsletter digest), get the key — the anonymous tier's abuse heuristics will throttle you mid-loop otherwise.

Jina AI skill vs Firecrawl: which one for scraping?

Different jobs. Jina Reader reads URLs you already know — one page in, Markdown out. Firecrawl crawls: give it a site and it discovers pages, follows links, and extracts structured data against a schema. If your input is a list of URLs, the Jina skill is lighter and the free tier covers it. If your input is 'this whole site', use the firecrawl MCP server — our scraping comparison covers the trade-offs in depth.

Is there a Jina AI MCP server, or only the skill?

Both exist. The jina-ai MCP server (mcp-jina-ai) exposes Jina's web reading, search, and fact-checking as MCP tools for any MCP client — Cursor, Claude Desktop, anything. The skill is the cheaper composition inside Claude Code: ~100 idle tokens versus tool schemas loading every turn. Pick the MCP server when a non-Claude-Code client needs the same capability; pick the skill when Claude Code is doing the work.

Can Jina Reader handle JavaScript-heavy pages and paywalls?

JavaScript-heavy pages, yes — Reader renders pages in a headless browser before converting, so SPAs that return an empty shell to curl come back with real content (use case 8). Hard paywalls and login walls, no — Reader fetches what an anonymous visitor can see. If content needs a session or a click to appear, you want a Playwright-class tool instead.

Why does r.jina.ai return a SecurityCompromiseError (HTTP 451)?

The anonymous tier categorizes bursty traffic as abuse and refuses with HTTP 451 — an HN user hit exactly this and the error message itself says the fix: authenticate with an API key. Put JINA_API_KEY in .env (the skill's setup script does this) and the error disappears for normal workloads. If you need hard guarantees, the reader is open source and self-hostable.

Why does Markdown matter? Can't Claude just read HTML?

It can, expensively. Simon Willison measured the Hacker News homepage at 13,367 tokens as HTML versus 1,550 as Markdown — roughly 8x. Markdown also strips nav, ads, and boilerplate, so the model reasons over signal instead of markup. For multi-page workflows (use cases 3, 6, 9), that difference is what keeps the run inside the context window.

Sources

Primary

Community

Critical and contrarian

Internal

Keep reading