Bright Data MCP: Official Setup & Scraping Guide (2026)

TL;DR + what you actually need

Three facts cover 90% of setup questions:

One secret: an API_TOKEN from your Bright Data account settings. It works for both the hosted endpoint and the local npx server. No zone configuration needed to start — the server provisions default Web Unlocker and Browser zones for you.
Two install paths: the hosted streamable-HTTP endpoint https://mcp.brightdata.com/mcp?token=YOUR_TOKEN (nothing to install, recommended), or the local stdio server npx @brightdata/mcp with API_TOKEN in the env block.
Two modes: Rapid (default, free tier) ships the core tools — search_engine, scrape_as_markdown, discover, and their batch variants. Pro mode (PRO_MODE=true or &pro=1 on the URL) unlocks 60+ tools: structured site extractors, browser automation, AI-answer monitoring. Pro usage beyond free credits is pay-as-you-go.

The fastest one-liner, for Claude Code with the hosted server:

claude mcp add --transport http brightdata "https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN"

The rest of this guide explains why the mode split exists, what each tool group does, and where the credits actually go.

What Bright Data MCP actually does

Bright Data is a web-data platform built on a large residential-proxy network — consumer IP addresses that make scraping traffic look like ordinary browsing. The MCP server (the company calls it Web MCP) is the agent-facing door to that infrastructure. When your agent calls scrape_as_markdown, the request routes through the Web Unlocker: proxy rotation, browser fingerprinting, CAPTCHA handling, and JavaScript rendering happen server-side, and the tool response is the page as clean markdown. The agent never sees a 403.

Four distinct Bright Data products sit behind the one tool list:

Web Unlocker — the anti-bot bypass layer behind scrape_as_markdown and scrape_as_html. This is the reason people pay for Bright Data: it succeeds on pages where plain fetchers and headless browsers get blocked.
SERP API — behind search_engine. Google, Bing, and Yandex results returned as structured data instead of a fragile scrape of a results page.
Web Scraper API — behind the web_data_* tools. Maintained extractors that return typed JSON for Amazon products, LinkedIn profiles, Instagram posts, and dozens more, so you skip parser maintenance entirely.
Scraping Browser — behind the scraping_browser_* tools. A remote browser the agent can navigate, click, and type into for multi-step flows that a single fetch can’t reach.

The server is open source (MIT) at github.com/brightdata/brightdata-mcp. One naming note: Bright Data was formerly called Luminati, and older links to luminati-io/brightdata-mcp redirect to the same repo. Same server, one canonical entry on this site at /servers/bright-data.

Like every wrapper-style MCP server, it adds no intelligence of its own — it is a metered door to a paid platform. That cuts both ways: you get infrastructure that took a decade to build, and every tool call draws from a credit pool you need to understand before you wire it to a loop-happy agent.

We compared the most popular Web Search MCP Servers with the same exact prompts on @fairies_agent

Exa
Perplexity
Firecrawl
Bright Data

Each server has their own unique strengths and limitations. See below the the results!
— Robert Yang (@GuangyuRobert)June 5, 2025

Rapid vs Pro vs custom — pick your tool surface

The mode system exists because of context pollution: a server that registers 60+ tools dumps 60+ schemas into the model’s context on every session, costs input tokens, and makes tool selection worse. Bright Data shipped tool groups during its MCP “launch week” specifically to fix this. Three configurations:

Rapid (default): the lean core — search_engine, scrape_as_markdown, discover, plus batch variants. Covers most agent workflows: search the web, fetch a protected page, rank results by relevance.
Pro: everything. Set PRO_MODE=true in the env (local) or append &pro=1 to the hosted URL. Adds structured extractors, browser automation, HTML scraping, AI-powered extract, and the AI-answer insight tools.
Custom: cherry-pick with GROUPS (comma-separated group IDs such as ecommerce, social, browser, finance, research, app_stores, travel, code, advanced_scraping) or TOOLS (individual tool names). On the hosted endpoint the same selection works as query parameters, e.g. &groups=ecommerce,social.

Our opinionated default: start in Rapid mode. Switch to a custom GROUPS selection the day you need a specific extractor, and reserve full Pro mode for exploration sessions where you want to see what exists. Registering all 60+ tools permanently in a coding agent is how you end up with the model calling web_data_zillow_properties_listing when you asked it to check an npm package.

Auth + the credit model

Auth is one token. Create a Bright Data account, copy the API token from account settings, and pass it either as the token query parameter (hosted) or the API_TOKEN env var (local). There is no OAuth flow, no key pair, no scopes — which makes setup trivial and makes the token a full-account credential. Treat it like a password: env vars or a secrets manager, never a committed config file.

The pricing shape, in evergreen terms (numbers current at the time of writing — verify on brightdata.com):

Free monthly allowance: the Rapid tier includes 5,000 requests per month, no credit card required, renewing automatically. Unused credits do not roll over. On team accounts the pool is shared across all users.
Credit accounting: base tools (search, scrape) cost 1 credit per request. Structured web_data_* tools cost 1 credit per record — a product-reviews extractor that returns 200 reviews consumes 200 credits, not 1. This is the single most misunderstood line item.
Depletion behavior: when credits run out, requests fail rather than auto-billing. You only spend real money after depositing funds for pay-as-you-go Pro usage. That default is friendlier than most usage-billed APIs.

A useful mental check before adopting: 5,000 requests a month is roughly 160 a day. A research agent doing a few dozen scrapes per session fits comfortably. A monitoring pipeline polling 500 product pages hourly does not — that’s a paid workload, and you should price it against Bright Data’s per-record rates before committing.

Install (every client)

Unlike most MCP servers, Bright Data’s default recommendation is the hosted remote server — no Node, no subprocess, no version drift. Clients that speak streamable HTTP (defined in the MCP spec as the remote-transport successor to SSE) point at:

https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN
# add &pro=1 for Pro mode, &groups=ecommerce,social for custom

For clients where you prefer a local stdio server, the install panel below pulls configs from the canonical /servers/bright-data catalog entry. Copy your client’s snippet, paste in your token, restart.

One-line install · Bright Data

Open server page

Install

Client-specific notes:

Claude Code — hosted: claude mcp add --transport http brightdata "https://mcp.brightdata.com/mcp?token=YOUR_TOKEN". Local: claude mcp add brightdata -e API_TOKEN=YOUR_TOKEN -- npx -y @brightdata/mcp. Add --scope project to register in the repo’s .mcp.json. Full flag reference at /clients/claude-code.
Claude Desktop — add the mcpServers JSON block (the npx form) to claude_desktop_config.json, or use the remote-server UI on paid plans with the hosted URL.
Cursor — paste either shape into ~/.cursor/mcp.json; Cursor supports both url (hosted) and command (local) entries. Config locations at /clients/cursor.

Codex CLI — stdio form in ~/.codex/config.toml:

[mcp_servers.brightdata]
command = "npx"
args = ["-y", "@brightdata/mcp"]

[mcp_servers.brightdata.env]
API_TOKEN = "YOUR_API_TOKEN"
PRO_MODE = "false"

Windsurf / VS Code / Gemini CLI — same JSON shape in each client’s MCP config file; the install panel above emits the exact snippet per client.

Verification: ask the agent “which Bright Data tools do you have?” In Rapid mode you should see the core search and scrape tools. If the list is empty, jump to troubleshooting.

Tools walkthrough

Grouped by what agents actually do with them. Exact tool names below are from the official docs; the full registry lives at docs.brightdata.com.

Core (Rapid mode)

search_engine scrapes Google, Bing, or Yandex results — JSON for Google, markdown for the others. This is the tool that makes Bright Data a serious web-search MCP contender: you get the actual SERP, ranks included, not a re-ranked summary. scrape_as_markdown fetches any single URL through the Web Unlocker and returns LLM-ready markdown. discover searches and ranks results by AI-scored relevance — useful when the agent needs “the five best sources,” not page one of Google. Batch variants (search_engine_batch, scrape_batch) run up to 10 queries or URLs in parallel, which matters for credit-efficient monitoring jobs.

Advanced scraping (Pro)

scrape_as_html returns raw HTML when markdown conversion loses structure you need. extract scrapes a page and converts it to structured JSON using AI sampling — schema-on-demand for sites that have no dedicated extractor. session_stats reports tool usage in the current session, handy for auditing what an agent actually spent.

Structured extractors: web_data_* (Pro)

The largest group — 40+ extractors that return typed records instead of page text. E-commerce: web_data_amazon_product plus Walmart, eBay, Home Depot, Etsy, Best Buy, and Google Shopping. Social: LinkedIn, Instagram, Facebook, TikTok, X, YouTube, Reddit posts, profiles, and comments. Business: Crunchbase, ZoomInfo, Zillow, Google Maps reviews. Developer: web_data_npm_package, web_data_pypi_package, GitHub repository files. Remember the billing unit: one credit per record returned.

Browser automation: scraping_browser_* (Pro)

Twelve tools that drive a remote browser — navigate, click, type, scroll, wait, screenshot, and scraping_browser_snapshot for accessibility-tree captures. Use these when the data sits behind interaction: a login wall, an infinite scroll, a multi-step form. Slower and costlier than a single scrape; reach for them only after scrape_as_markdown fails.

AI-answer insights (Pro)

The 2026 additions: tools like web_data_chatgpt_ai_insights (with Grok and Perplexity equivalents) query AI assistants and return what they say — built for GEO (generative engine optimization) monitoring. If your marketing team asks “what does ChatGPT say about our brand?”, this is the tool that answers it on a schedule.

Recipes

Six workflows where the server earns its registration. Each is a single prompt to an agent with the server installed.

Recipe 1 — Scrape a bot-protected site

The headline use case. Prompt: “Fetch https://example-retailer.com/product/123 with scrape_as_markdown and summarize the spec table.” Sites that 403 a plain fetch or serve a CAPTCHA to headless Chromium come back as markdown because the Unlocker handles the fight server-side. Cost: one credit. If markdown drops a table you need, retry with scrape_as_html (Pro).

Recipe 2 — SERP monitoring

Prompt: “Run search_engine_batch for these 8 keywords on Google. For each, report the top 5 URLs and note where ourdomain.com ranks. Output a markdown table.” Eight credits per run; daily for a month is well inside the free tier. Because search_engine returns real SERP JSON, rank positions are data, not the agent’s guess from a summarized page.

Recipe 3 — E-commerce price tracking

Prompt: “Use web_data_amazon_product on these 12 ASINs. Extract price, availability, and review count, then diff against yesterday’s CSV and flag changes over 5%.” Structured extractors return consistent fields, so the diff step is trivial. Watch the per-record billing if you add a reviews extractor to the loop.

Recipe 4 — Competitive social listening

Prompt: “Pull the latest posts from these three competitor X accounts and their newest YouTube videos. Summarize themes and announcements from the past two weeks.” The social web_data_* extractors fetch public posts and profiles without you maintaining API keys for each platform — the practical draw for research agents.

Recipe 5 — Multi-step extraction with the Scraping Browser

Prompt: “Open the dealer-locator page, type 94103 into the ZIP field, submit, and extract the resulting list with scraping_browser_snapshot.” The agent chains navigate → type → click → snapshot. Reserve this for data a single request can’t reach; each step is a separate metered call.

Recipe 6 — GEO / AI-answer monitoring

Prompt: “Ask ChatGPT, Grok, and Perplexity ‘what is the best proxy provider for web scraping?’ via the AI insight tools and report whether our brand appears and how it is described.” Brand-presence-in-AI-answers tracking, runnable on a weekly schedule from any MCP client.

Cost control + limits

The failure mode to engineer against is not rate limiting — it’s an agent loop silently draining credits. An open GitHub issue on the official repo asks for exactly this: policy enforcement to prevent runaway scraping costs. Until the server ships guardrails, they are your job:

Cap iterations in the prompt. “Scrape at most 10 pages, then stop and report” is cheap insurance against a retry loop that burns 500 credits on one stubborn URL.
Batch instead of looping. scrape_batch with 10 URLs is one round-trip and predictable spend; ten sequential scrape_as_markdown calls invite the model to improvise between them.
Audit with session_stats. In Pro mode, end long sessions by asking the agent to call session_stats and report tool usage. Cheap observability.
Mind the per-record multiplier. A single web_data_* call returning hundreds of records bills hundreds of credits. Constrain result counts in your prompt when the extractor supports it.

Latency expectations: simple scrapes return in seconds, but hard targets that trigger CAPTCHA-solving and rendering can take much longer — the web_data_* polling timeout defaults to 600 seconds. Set client timeouts accordingly, and raise BASE_TIMEOUT / BASE_MAX_RETRIES (0–3) for flaky targets rather than letting the agent re-fire the whole call.

Troubleshooting

`spawn npx ENOENT` on launch

Your MCP client can’t find npx on its PATH — common when the client is a GUI app launched outside your shell. Use the absolute Node path in the config (which npx to find it), or switch to the hosted endpoint and skip Node entirely. This is the top reported issue on the repo.

Tools time out on hard targets

Unlocker-heavy scrapes (CAPTCHA + rendering) can exceed default client timeouts. Raise the client’s MCP tool timeout to ~180 seconds, set BASE_TIMEOUT higher for base tools, and leave the POLLING_TIMEOUT default (600s) alone for web_data_* tools.

Pro tools don’t appear

Mode flag missing or in the wrong place. Local: PRO_MODE must be the string "true" inside the server’s env block, not your shell. Hosted: &pro=1 belongs on the endpoint URL itself. Restart the client after either change — tool lists load at session start.

Every call fails with an auth error

Token problem. Confirm the token is current in your Bright Data account settings, has no whitespace from the copy-paste, and — on the hosted URL — is properly part of the query string. If your network has an egress proxy or firewall, confirm mcp.brightdata.com is reachable.

Requests suddenly stop succeeding mid-month

Credit pool exhausted — by design, depletion fails requests instead of billing you. Check usage in the Bright Data dashboard. On team accounts remember the free pool is shared across every user; one colleague’s scraping spree empties it for everyone.

What we got wrong

Three assumptions we made when first covering this server, all wrong:

We assumed the npx server was the main path. It was, in early versions. The hosted endpoint at mcp.brightdata.com is now the better default — it removes the Node-version and ENOENT failure class entirely and gets updates without re-installing.
We assumed Pro mode was a paid plan toggle. It isn’t — it’s a tool-visibility flag on the same token. Enabling it costs nothing by itself; spend only starts when paid tools draw past the free credits, and even then only if you’ve deposited funds.
We ran full Pro mode in a coding agent. 60+ tool schemas in context made every session slower and tool selection visibly worse. The GROUPS filter exists precisely because of this; we now register advanced_scraping plus one domain group per project.

Community signal

The recurring community verdict is some version of “heaviest hammer in the drawer.” Daniel Miessler, who tiers his scraping stack as curl → Jina → Firecrawl → Bright Data inside a Claude Code command, puts it bluntly: when he absolutely needs the data, Bright Data is the one that consistently works (disclosure: his post is sponsored by Bright Data, though he states he was a paying user first). Side-by-side comparisons like Robert Yang’s test above reach a similar split — Exa and Perplexity for semantic search quality, Bright Data for raw retrieval reliability on hostile targets.

Practitioner content keeps the same shape: agentic-RAG tutorials pair the server with a vector database for scrape-then-index pipelines —

Here’s the MCP-powered agentic RAG workflow again for your reference.

It integrates two MCP tools:
- Bright Data to scrape data at scale for Agents, and
- Qdrant vector database for vector database search.

Everything is set up locally on your machine!
— Avi Chawla (@_avichawla)April 8, 2025

The contrarian voice is serious and worth sitting with. In June 2026, researchers at Include Security published a reverse-engineering of Bright Data’s consumer SDK showing that free apps — including always-on smart TVs — act as residential exit nodes for the scraping network, with the traffic originating from users’ home IPs. The debate centers on consent: Bright Data says its opt-in screen is explicit, named, and reversible; critics question whether users understand what they’re agreeing to. Google, Amazon, and Roku restricted background proxy SDKs in response, and Bright Data dropped those platforms. None of this changes how the MCP server works — but if your organization audits where its traffic provenance comes from, this belongs in the evaluation, not a footnote.

When to use alternatives

Our Take

Use Bright Data MCP when targets fight back — bot-protected sites, SERP data at rank-level fidelity, structured e-commerce/social extraction — and the free tier covers surprising amounts of agent work. Skip it if your targets are ordinary public pages (cheaper tools suffice), if you need large-scale crawling logic rather than per-page retrieval, or if residential-proxy provenance is a compliance problem for your org.

Ordinary pages, simple pipelines: a plain fetch or the free Jina reader tier handles unprotected URLs at zero cost — our Jina AI skill cookbook covers that pattern. Spending Unlocker credits on Wikipedia is waste.
Crawl jobs and site-wide extraction: Firecrawl and Crawlee are built around crawl frontiers, sitemaps, and recursive extraction; Bright Data’s MCP tools are per-page/per-query. The trade-offs are mapped in our five-way scraping-stack comparison and the Firecrawl vs Crawlee vs Playwright matchup.
Semantic search, not retrieval: if the job is “find the best sources on X” rather than “fetch this exact page,” Exa- and Perplexity-style servers rank higher on answer quality. Our best web-search MCP servers roundup scores the field.

FAQ

Is Bright Data MCP free?

Partly. At the time of writing, the default Rapid mode includes 5,000 free requests per month — no credit card required, renewing monthly, with unused credits not rolling over. Base tools cost 1 credit per request; structured web_data_* tools cost 1 credit per record. When credits run out, requests stop rather than billing you, unless you have deposited funds for pay-as-you-go Pro usage. Check brightdata.com for current allowances.

What is the Bright Data MCP URL?

The hosted remote endpoint is https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN — a streamable-HTTP server Bright Data runs for you, so there is nothing to install. Append &pro=1 to enable the full Pro tool set. The local alternative is a stdio server launched with `npx @brightdata/mcp` and an API_TOKEN environment variable. Both authenticate with the same token from your Bright Data account settings.

Bright Data MCP vs Firecrawl MCP — which should I use?

Firecrawl is a developer-first scraping API with clean markdown output and simple pricing; it struggles on heavily protected targets. Bright Data routes through a residential-proxy unlocker network, so it succeeds on sites that block everything else, and adds structured extractors for Amazon, LinkedIn, and 40+ domains. Use Firecrawl for ordinary sites and crawl jobs; use Bright Data when targets fight back. Many teams register both.

How does the Web Unlocker work?

Web Unlocker is Bright Data's anti-bot bypass layer. When the MCP server's scrape_as_markdown tool hits a protected page, the request is routed through Bright Data's proxy network with automatic browser fingerprinting, CAPTCHA solving, retries, and JavaScript rendering as needed. Your agent sees only the final result: page content returned as clean markdown. You never manage proxies, headers, or CAPTCHA solvers yourself.

What is Pro mode and what does it cost?

Pro mode unlocks the full tool surface — 60+ tools including batch scraping, scrape_as_html, AI-powered extract, structured web_data_* extractors for e-commerce and social sites, and the 12 scraping-browser automation tools. Enable it with PRO_MODE=true locally or &pro=1 on the hosted URL. Usage beyond the free monthly credits is pay-as-you-go against deposited funds; rates vary by tool, so check Bright Data's current pricing page.

Is luminati-io/brightdata-mcp the same repo as brightdata/brightdata-mcp?

Yes. Bright Data was formerly named Luminati, and GitHub redirects the old luminati-io organization URLs to the current brightdata organization. Both paths land on the same MIT-licensed repository. If a tutorial points you at luminati-io/brightdata-mcp, you are not looking at a fork or an abandoned variant — it is the official server.

Can Bright Data MCP scrape Amazon, LinkedIn, or Instagram?

Yes, via the Pro-mode web_data_* extractors — purpose-built tools that return structured JSON (not raw HTML) for products, profiles, posts, and reviews across 40+ major sites including Amazon, Walmart, eBay, LinkedIn, Instagram, TikTok, X, YouTube, and Reddit. These cost 1 credit per record and are the most reliable way to get e-commerce or social data into an agent without writing parsers.

Is using Bright Data's residential proxy network ethical?

Contested. Bright Data's exit nodes are consumer devices whose owners opt in through an SDK consent screen — security researchers published findings in June 2026 questioning how meaningful that consent is, and Google, Amazon, and Roku restricted background proxy SDKs in response. Bright Data says its opt-in is explicit and reversible. If your compliance team cares about traffic provenance, read the research before adopting.

Sources

Official repository (MIT): github.com/brightdata/brightdata-mcp (old luminati-io URLs redirect here)
Official MCP docs: docs.brightdata.com/ai/mcp-server/overview (modes, tool registry, free-tier terms)
npm package: npmjs.com/package/@brightdata/mcp
Tool Groups launch: brightdata.com/ai/mcp-server/launch-week/day1
Web-search MCP comparison (community): Robert Yang on X
Agentic RAG workflow (community): Avi Chawla on X
Practitioner write-up (sponsored): danielmiessler.com — Web Scraping with Bright Data and Claude Code
Contrarian / security research: The Hacker News — smart TVs as web-scraping proxies for AI (Include Security findings, June 2026)
Cost-guardrail discussion: brightdata-mcp GitHub issues
Canonical MCP.Directory entry: /servers/bright-data

Comparison

Crawlee vs Apify vs Firecrawl vs Spider vs ScrapeGraph (2026)

Read

Roundup

The Best Web Search MCP Servers (2026)

Read

Jina AI skill for Claude: 10 URL-to-Markdown recipes

Read

Found an issue?

If something in this guide is out of date — a new tool group, changed free-tier terms, a different hosted endpoint — email [email protected] or read more on our about page. We keep these guides current.

TL;DR + what you actually need

What Bright Data MCP actually does

Rapid vs Pro vs custom — pick your tool surface

Auth + the credit model

Install (every client)

Install

Tools walkthrough

Core (Rapid mode)

Advanced scraping (Pro)

Structured extractors: web_data_* (Pro)

Browser automation: scraping_browser_* (Pro)

AI-answer insights (Pro)

Recipes

Recipe 1 — Scrape a bot-protected site

Recipe 2 — SERP monitoring

Recipe 3 — E-commerce price tracking

Recipe 4 — Competitive social listening

Recipe 5 — Multi-step extraction with the Scraping Browser

Recipe 6 — GEO / AI-answer monitoring

Cost control + limits

Troubleshooting

spawn npx ENOENT on launch

Tools time out on hard targets

Pro tools don’t appear

Every call fails with an auth error

Requests suddenly stop succeeding mid-month

What we got wrong

Community signal

When to use alternatives

FAQ

Sources

Crawlee vs Apify vs Firecrawl vs Spider vs ScrapeGraph (2026)

The Best Web Search MCP Servers (2026)

Jina AI skill for Claude: 10 URL-to-Markdown recipes

`spawn npx ENOENT` on launch