Jina AI skill for Claude: 10 URL-to-Markdown recipes

jina-ai

Same Reader API as MCP tools, for clients where the skill can't run Python.

deep-research

Chains multiple read calls into a structured multi-source report.

Ground an answer with live web search

Stop Claude from answering a time-sensitive question from training data. One s.jina.ai search returns the top results as full Markdown, so the answer cites pages, not memory.

ForQuestions where 'as of my knowledge cutoff' is an unacceptable answer — releases, deprecations, breaking changes.

The prompt

Use the jina-ai skill. Run `python3 scripts/jina_tools.py search "Node.js LTS schedule current active version"` and answer strictly from the results: which Node major versions are in Active LTS and Maintenance right now? List each claim with the source URL it came from. If the results disagree, say so instead of picking one.

What slides.md looks like

$ python3 scripts/jina_tools.py search "Node.js LTS schedule current active version"
[
  {
    "title": "Node.js — Releases",
    "url": "https://nodejs.org/en/about/previous-releases",
    "content_markdown": "## Release schedule\n\nMajor Node.js versions enter
      Current release status for six months..."
  },
  { "title": "...", "url": "...", "content_markdown": "..." }
]

One-line tweak

Append 'site:github.com' style operators to the query string — s.jina.ai passes the query to a real SERP, so search operators work.

Pairs with

brave-search

Cheaper when you only need links and snippets, not full page content.

perplexity

Sonar does search + synthesis in one API call when you'd rather not orchestrate.

Build a RAG corpus from a list of URLs

Convert 50 documentation pages into individual Markdown files ready for chunking and embedding. The skill loops Reader over a URL list and writes one .md per page.

ForAnyone bootstrapping a RAG index over docs they don't own — no crawler config, no HTML parsing.

The prompt

Use the jina-ai skill. I have urls.txt with one docs URL per line (about 50 lines). Write and run a bash loop that calls `python3 scripts/jina_tools.py read` on each URL, extracts content_markdown with jq, and saves it to corpus/<slugified-hostname-path>.md. Sleep 4 seconds between calls so we stay under the keyless rate limit. Print a final count of files written and flag any URL that returned empty content.

What slides.md looks like

while IFS= read -r url; do
  slug=$(printf '%s' "$url" | sed -E 's|https?://||; s|[^A-Za-z0-9]+|-|g')
  python3 scripts/jina_tools.py read "$url" \
    | jq -r '.content_markdown' > "corpus/$slug.md"
  sleep 4
done < urls.txt
ls corpus | wc -l   # 50

One-line tweak

Set JINA_API_KEY in .env first and drop the sleep — the keyed rate limit is an order of magnitude higher than the anonymous one.

Pairs with

qdrant

Embed and store the corpus/ files for retrieval in the next session.

crawl4ai

Better when you need to discover URLs by crawling, not just convert a known list.

Summarize a long article into a decision brief

A 6,000-word engineering blog post becomes a half-page brief: claims, evidence, what to do about it. Reader strips nav, ads, and comments so the summary works from signal only.

ForTech leads triaging 'have you seen this post?' links without spending 25 minutes per link.

The prompt

Use the jina-ai skill. Read this article with `python3 scripts/jina_tools.py read <url>`, then produce a brief with exactly four sections: TL;DR (2 sentences), Core claims (bullet list, each with the supporting evidence the author gives), What's opinion vs. measurement (split them), and Action for our team (1-2 sentences, assume we run a Postgres-backed Rails monolith). Do not pad. If the article gives no measurements, say so under evidence.

What slides.md looks like

## TL;DR
The author migrated from sidekiq to pg-backed jobs and cut infra
spend; the win came from deleting Redis, not from job throughput.

## Core claims
- Queue latency stayed under 100ms at their scale (measured, p95 graph)
- Postgres SKIP LOCKED handles their 2k jobs/min (measured)

## Opinion vs. measurement
Opinion: "Redis is operational overhead you don't need."

One-line tweak

Change the final section's context line ('assume we run...') — it's the difference between a generic summary and a brief someone acts on.

Pairs with

notion

File the brief into the team's reading-log database in the same run.

content-trend-researcher

Aggregates many briefs into a trend view across weeks.

Monitor a competitor's changelog or pricing page

Snapshot a page as Markdown on a schedule, commit it to git, and let the diff tell you what changed. Reader normalizes the HTML so diffs show content changes, not markup churn.

ForPMs and founders who currently 'check the competitor's site sometimes' and miss the week it mattered.

The prompt

Use the jina-ai skill. Write snapshot.sh: it reads https://example-competitor.com/changelog and /pricing via `python3 scripts/jina_tools.py read`, writes each content_markdown to snapshots/changelog.md and snapshots/pricing.md, then runs git add + git commit -m "snapshot $(date +%F)". Then run it once and show me `git diff HEAD~1 -- snapshots/` if a previous snapshot exists. Summarize any diff in two sentences.

What slides.md looks like

$ git diff HEAD~1 -- snapshots/pricing.md
-| Team    | $49/mo  | 5 seats  |
+| Team    | $59/mo  | 5 seats  |
+| Scale   | $199/mo | 25 seats, SSO |

# Claude: "Team tier price increased and a new Scale tier
# with SSO appeared — their first enterprise motion."

One-line tweak

Run snapshot.sh from cron or CI weekly; the prompt only needs to be re-run when you want the diff interpreted.

Pairs with

github

Push snapshots to a repo so the history outlives your laptop.

web-research

Expands a single-page watch into a structured competitor profile.

Search-then-read research pipeline

The two operations composed: search finds candidate sources, read pulls each one in full, and Claude writes the comparison from complete pages instead of search snippets.

ForResearch questions where the answer lives in the third paragraph of page two — snippets alone get it wrong.

The prompt

Use the jina-ai skill. Question: how do Postgres logical replication and pglogical differ for zero-downtime major-version upgrades? First run `python3 scripts/jina_tools.py search` on the question. Pick the 3 most authoritative result URLs (prefer postgresql.org and maintainer blogs), read each with the read command, and write a comparison: capabilities table, failure modes each source warns about, and a recommendation. Cite which source each row came from.

What slides.md looks like

python3 scripts/jina_tools.py search \
  "postgres logical replication vs pglogical major upgrade" > hits.json
jq -r '.[].url' hits.json | head -3 | while IFS= read -r u; do
  python3 scripts/jina_tools.py read "$u" >> sources.jsonl
done
# 3 full pages → one cited comparison table

One-line tweak

Raise 'pick the 3 most authoritative' to 5 for contested topics — disagreement between sources is the finding.

Pairs with

exa-search

Semantic search surfaces sources keyword SERPs miss for conceptual queries.

gpt-researcher

Automates this exact search-read-synthesize loop at larger scale.

Fact-check a claim before you publish it

A claim goes in, a verdict comes out: supported, contradicted, or unverifiable — with the exact passages that decide it. Search grounding makes the check reproducible.

ForAnyone shipping content where a wrong stat costs credibility — blog posts, decks, docs.

The prompt

Use the jina-ai skill. Claim to verify: "SQLite is the most widely deployed database engine in the world." Search for it with `python3 scripts/jina_tools.py search`, read the two most authoritative results in full, and return a verdict block: VERDICT (supported / contradicted / unverifiable), the verbatim passage from each source that supports the verdict, the source URLs, and one sentence on how confident I should be. Do not soften the verdict.

What slides.md looks like

VERDICT: supported
Source 1: https://www.sqlite.org/mostdeployed.html
  "SQLite is likely used more than all other database
   engines combined."
Source 2: independent coverage repeating the claim with
  the same first-party origin.
Confidence: high for "most deployed", noting the primary
source is the SQLite project itself.

One-line tweak

Batch it: paste five claims and ask for five verdict blocks — the per-claim cost is one search plus two reads.

Pairs with

perplexity

A second, independent grounding engine to cross-check the verdict.

academic-researcher

Escalates to papers when the claim needs more than web sources.

Read JavaScript-heavy pages that curl can't

SPAs return an empty <div id="root"> to plain HTTP fetches. Reader renders the page in a headless browser first, so client-side content comes back as Markdown.

ForAnyone who has piped a React app into curl and received 1 KB of nothing.

The prompt

Use the jina-ai skill. This page is a client-rendered SPA and curl returns an empty shell: https://app.example.com/public/changelog. Fetch it with `python3 scripts/jina_tools.py read` instead and confirm you got real content by printing the first three headings from content_markdown. Then summarize the last five releases listed.

What slides.md looks like

$ curl -s https://app.example.com/public/changelog | wc -c
1204    # <div id="root"></div> and a script tag

$ python3 scripts/jina_tools.py read "https://app.example.com/public/changelog" \
    | jq -r '.content_markdown' | grep '^#' | head -3
## v2.41 — Edge caching for media
## v2.40 — SSO domain enforcement
## v2.39 — Bulk export API

One-line tweak

If a page needs a click or login before content appears, Reader can't help — that's headless-browser-with-a-session territory.

Pairs with

playwright

Takes over when the page needs interaction, auth, or form input.

firecrawl-scraper

Purpose-built scraping with structured extraction schemas, beyond read-as-markdown.

Digest every link in a newsletter or awesome-list

Twelve links become an annotated bibliography: one verdict line per link, ranked by relevance to what you're building. Claude reads all of them; you read one paragraph.

ForDevelopers with 40 open tabs from three newsletters and zero memory of why.

The prompt

Use the jina-ai skill. Here are the 12 links from this week's newsletter: [paste links]. Read each with `python3 scripts/jina_tools.py read`. Return an annotated bibliography sorted by relevance to building LLM agents in TypeScript: for each link, one line — title, 15-word summary, and a tag from {read-now, skim, skip}. End with the single link I should read first and why, in one sentence.

What slides.md looks like

1. [read-now] Streaming tool calls without buffering —
   pattern for incremental tool-call parsing in TS agents.
2. [skim] Postmortem: agent loop burned $400 — retry logic
   lesson, applies to any provider.
3. [skip] "Top 10 AI tools" — listicle, no code.
...
Read first: #1 — it solves the exact buffering bug in
your current agent runner.

One-line tweak

Change the relevance anchor ('building LLM agents in TypeScript') per run — the same links rank differently for different projects.

Pairs with

obsidian

Append the bibliography to a weekly note so digests accumulate.

deep-research

Promotes a read-now link into a full multi-source investigation.

Call Reader directly when the script is overkill

The skill's pattern without the skill's Python: r.jina.ai is a URL prefix, so any HTTP client is an integration. Useful inside CI, other agents, or a five-line script of your own.

ForAnyone wiring URL-to-Markdown into an existing pipeline that already speaks HTTP.

The prompt

Use the jina-ai skill as reference. Write me a minimal fetch_markdown function for our TypeScript agent that GETs https://r.jina.ai/<target-url> with an Authorization: Bearer header from JINA_API_KEY and X-Return-Format: json, parses {data: {title, url, content}}, and throws a typed error on HTTP 451 (rate-limit/abuse) telling the caller to add an API key. Include the curl equivalent in a comment for debugging.

What slides.md looks like

# the whole integration, as curl:
curl "https://r.jina.ai/https://example.com/post" \
  -H "Authorization: Bearer $JINA_API_KEY" \
  -H "X-Return-Format: json"

{ "data": { "title": "...", "url": "...",
    "content": "## The post, as Markdown..." } }

One-line tweak

Drop X-Return-Format: json to get raw Markdown back — simpler when you're piping straight into a prompt.

Pairs with

jina-ai

The MCP-server packaging of the same APIs for non-Claude-Code clients.

firecrawl

Reach for it when the job is crawling a whole site, not reading known URLs.

Community signal

Three voices from people using Jina Reader for real. The first two are Simon Willison — the most credible independent endorsement this tool category has — and the third is the free-key discovery that fixes most rate-limit complaints.

“Jina.ai offer a really neat (currently free) API for this - you add https://r.jina.ai/ on the beginning of any API and it gives you back a Markdown version of the main content of that page, suitable for piping into an LLM.”

simonw (Simon Willison, HN) · Hacker News

The clearest third-party endorsement on the table — Willison has been recommending the prefix trick across multiple HN threads since 2024.

“Right: the token savings can be enormous here. Use https://tools.simonwillison.net/jina-reader to fetch the https://news.ycombinator.com/ homepage as Markdown and paste it into https://tools.simonwillison.net/claude-token-counter - 1550 tokens. Same thing as HTML: 13367 tokens.”

simonw (Simon Willison, HN) · Hacker News

The measured case for Markdown conversion: the same page costs roughly 8x fewer tokens as Markdown than as raw HTML. This is the economics underneath every use case below.

“TIL Jina Reader API has (low rate limit) API key free option. https://jina.ai/reader/”

flakiness (HN) · Hacker News

The free-key discovery moment — the keyless tier is throttled hard, and a free key changes the experience.

The contrarian take

Not everyone has a smooth ride. The most useful critique comes from djoldman (HN), who hit the anonymous tier’s abuse wall mid-session:

“jina is getting cranky: ... “SecurityCompromiseError: Your request is categorized as abuse. Please don't abuse our service. If you are sure you are not abusing, please authenticate yourself with an API key.””

djoldman (HN) · Hacker News

From the HN thread where simonw's Jina-backed tool started erroring for other users.