Back to all posts

Selenium MCP: Browser Automation from Claude (2026)

The Selenium MCP server turns Selenium WebDriver into a set of tools Claude can call directly — start a browser, navigate, click, type, read text, run JavaScript, take screenshots — with no script to write. Built by Angie Jones, it speaks the same locator language as the Selenium suite your QA team already maintains. This guide covers what it does, the honest decision against Playwright MCP and Chrome DevTools MCP, install for every client, the full tool surface, three working recipes, and the limits the README won’t tell you about.

Updated June 12, 2026 ~14 min read3,200 words
Editorial illustration: a luminous emerald browser-window glyph at the center, steered by an orbiting ring of small gear and cursor glyphs representing WebDriver sessions, with four softly glowing arcs reaching out to four distinct browser-orb glyphs on a midnight navy backdrop.

TL;DR

Selenium MCP is a stdio MCP server that wraps Selenium WebDriver — the W3C-standard browser automation protocol — so an AI agent can drive Chrome, Firefox, Edge, and Safari from natural language. Everything you need to start:

  • One command, no keys: claude mcp add selenium -- npx -y @angiejones/mcp-selenium@latest. The server has no auth, no env vars, no account — it drives browsers already on your machine.
  • Four real browsers: Chrome, Firefox, Edge, Safari. Safari needs a one-time sudo safaridriver --enable on macOS and has no headless mode.
  • The honest default: if you have no existing Selenium investment, install Playwright MCP instead. Selenium MCP earns its slot when your team already thinks in WebDriver locators and ships Selenium suites — the decision section below makes the case both ways.

Ask your agent “Open Chrome, go to github.com/angiejones, and take a screenshot” and it chains start_browser navigatetake_screenshot on its own.

What Selenium MCP does

Selenium WebDriver is the twenty-year-old workhorse of browser test automation: a wire protocol where a driver binary (chromedriver, geckodriver, safaridriver) accepts commands like “find the element matching this CSS selector, click it.” Normally you write those commands as Java, Python, or JavaScript test code. The Selenium MCP server moves that command surface behind MCP tools, so the model writes the commands at runtime instead of you writing them ahead of time.

Concretely: when you tell Claude “fill the signup form on staging and submit it,” the agent calls start_browser with { browser: "chrome" }, then navigate, then send_keys with locator strategies it picks itself (id, css, xpath, name, tag, class), then interact with action: "click" on the submit button. Every call runs through a real WebDriver session — the same engine your CI suite uses, not an embedded headless shim.

The author matters here. Angie Jones is one of the best-known names in test automation — Java Champion, long-time automation educator — and mcp-selenium (MIT, shipped in early 2025) reads like a tester built it: explicit timeouts on every element tool, a diagnostics tool for console and network errors, an accessibility-tree resource for when locator guessing fails. Her launch post:

She also recorded a short demo of the install-to-automation loop, worth two minutes if you prefer watching to reading:

One catalog note: a second, unrelated server also matches the “selenium mcp” search — selenium_mcp by amandeep-sg. This guide covers the angiejones server at /servers/selenium, the widely adopted one; treat the other as an alternative to evaluate, not the default.

Selenium MCP vs Playwright MCP vs Chrome DevTools MCP

Three serious browser-automation MCP servers exist, and they are not interchangeable. The short version: Playwright MCP is the best general-purpose default, Chrome DevTools MCP is the best Chrome-debugging specialist, and Selenium MCP is the right pick when Selenium is already your stack.

  • Playwright MCP (Microsoft) is snapshot-first: it hands the model a structured accessibility snapshot of the page with stable element references, so the model picks elements from a list instead of inventing selectors. That design makes it more reliable on pages the model has never seen, and community benchmarks consistently report faster per-action timings than WebDriver’s HTTP round trips. If you are starting from zero, start here.
  • Chrome DevTools MCP (Chrome team) trades browser breadth for depth: Chrome only, but with performance tracing, network inspection, and DevTools-grade debugging. It is the pick for “why is this page slow” work, not test authoring. We compared these two head-to-head in Chrome DevTools MCP vs Playwright MCP.
  • Selenium MCP speaks locator-first WebDriver. The agent works in id / css / xpath terms — the exact vocabulary of every existing Selenium suite — and drives four real browsers, including Safari through Apple’s own safaridriver, which neither competitor offers as a true native target.

So when does Selenium MCP win? Three situations, all variations of “Selenium is already load-bearing in your org”:

  1. You maintain a Selenium suite. When the agent finds a working locator path through your app, that finding converts line-for-line into your existing Java or Python page objects. An agent exploring via Playwright snapshot references gives you nothing you can paste into a WebDriver test.
  2. Your QA team thinks in WebDriver. The mental model — explicit waits, locator strategies, frame switching, alert handling — maps one-to-one onto this server’s tools. Zero retraining.
  3. You need real Safari coverage. start_browser with browser: "safari" drives actual Safari on macOS, not WebKit-the-engine. For teams whose bug reports say “only happens in Safari,” that distinction is the whole job.

The honest caveat: if none of those three describe you, Selenium MCP is the wrong default. Locator-guessing is the weakest part of the design — GitHub issue #14 (“We should not rely on AI models for locators”) argued the snapshot-reference approach is fundamentally more reliable, and the project’s answer was an accessibility-tree resource, not abandoning locators. More in the Limits section.

Install (every client)

The server is stdio-only: your MCP client launches npx -y @angiejones/mcp-selenium@latest as a subprocess. No API keys, no env vars. The install panel below stays in sync with the canonical /servers/selenium catalog entry — pick your client, copy, restart:

One-line install · Selenium

Open server page

Install

Client-specific notes worth knowing:

  • Claude Code: claude mcp add selenium -- npx -y @angiejones/mcp-selenium@latest. Add --scope project to register it in the repo’s .mcp.json so your whole team gets it. See /clients/claude-code.
  • Cursor / Windsurf / Claude Desktop: paste the JSON block from the panel into ~/.cursor/mcp.json, ~/.codeium/windsurf/mcp_config.json, or claude_desktop_config.json respectively. See /clients/cursor for exact paths.
  • Goose: the README’s first-listed client. CLI: goose session --with-extension "npx -y @angiejones/mcp-selenium@latest"; Goose Desktop has a one-click goose://extension deep link in the README.
  • Safari users (macOS): before the first run, execute sudo safaridriver --enable once, then enable Allow Remote Automation under Safari → Settings → Developer. Skip this and start_browser fails with a session-creation error.

Verify the install by asking the agent: “Open Chrome headless, go to example.com, and tell me the h1 text.” Expected: three tool calls (start_browser, navigate, get_element_text) returning “Example Domain.”

Tools walkthrough

The current release ships 18 tools plus two read-only MCP resources. One history note that prevents real confusion: the server originally sprawled past 30 tools (find_element, click_element, hover… one per action) before the maintainer consolidated the surface in issue #60 — in her words, “now it feels bloated with 30+ tools. Rethink and refactor before next release.” Older tutorials still show the old names; this walkthrough matches the current README.

Session: start_browser, close_session

start_browser takes browser (chrome | firefox | edge | safari) and an optional options: { headless: boolean, arguments: string[] } — the arguments array passes raw browser flags like --window-size=1280,800. close_session tears it down. The server manages one current session; tell the agent to close before switching browsers.

Acting: navigate, interact, send_keys, press_key, upload_file

interact is the consolidated mouse tool: action is click, doubleclick, rightclick, or hover, plus a locator (by + value) and an optional timeout (default 10,000 ms — every element tool waits rather than failing instantly, which is classic Selenium explicit-wait behavior). send_keys types into an element and clears the field first; press_key sends a raw key like Enter or Tab; upload_file feeds an absolute path to a file input.

Reading: get_element_text, get_element_attribute, take_screenshot

get_element_text and get_element_attribute (e.g. href, value, class) are how the agent asserts state without burning tokens on screenshots. take_screenshot saves to outputPath or returns base64 image data if you omit it — prefer the path form in long sessions; inline base64 screenshots are the fastest way to blow up your context window.

Context: window, frame, alert, cookies

window lists and switches tabs (list / switch / switch_latest / close), frame moves focus into an iframe and back, and alert accepts, dismisses, reads, or types into native dialogs — three things agent-built browser flows hit constantly and plain DOM tools can’t touch. add_cookie, get_cookies, and delete_cookie round out session management (note: the browser must already be on a page from the cookie’s domain before add_cookie works).

Escape hatches: execute_script, diagnostics

execute_script runs arbitrary JavaScript in the page — the README points at drag-and-drop, scrolling, and computed styles. diagnostics is the standout tool of the current release: it returns console, errors, or network data captured via WebDriver BiDi (Selenium’s bidirectional protocol, auto-enabled when the browser supports it). An agent that fills a form, checks diagnostics for console errors, and reports what it found is doing real QA, not just clicking.

Resources: browser-status:// and accessibility://

Two read-only resources: browser-status://current reports the active session, and accessibility://current returns a compact JSON accessibility-tree snapshot of the page — much smaller than raw HTML, and the direct answer to the locator-reliability criticism. If your client supports MCP resources, tell the agent to read the accessibility tree before guessing selectors.

Recipes

Three workflows where this server earns the install. Each is a prompt you can paste, assuming the server is registered and Chrome is installed.

Recipe 1 — Regression-check a form before merging

“Open Chrome headless. Go to staging.example.com/signup. Fill the form: email [email protected], password from my clipboard placeholder TestPass!234, check the terms checkbox, click Submit. Then: tell me the confirmation text, run diagnostics for console errors, and screenshot the result to /tmp/signup-check.png.”

The agent chains start_browser navigate → three send_keys interact (click, twice) → get_element_textdiagnostics take_screenshot. The diagnostics step is the part a human tester skips at 5 p.m. — silent console errors on a “passing” form are exactly what this catches. If you run this shape of check often, pair the server with the webapp-testing skill — it gives the agent a repeatable test-plan structure instead of ad-hoc clicking.

Recipe 2 — Scrape a page behind a login

“Open Firefox. Go to app.example.com/login, sign in with the EMAIL and PASSWORD I gave you, wait for the dashboard, then read the text of every row in the #invoices table and give me a markdown table of invoice number, date, and amount.”

Standard send_keys login plus get_element_text extraction. Two refinements: if the app keeps you signed in via a session cookie, skip the form entirely — navigate to the domain once, then add_cookie with your session token, which is faster and immune to CAPTCHA. And never paste production credentials into a prompt; use a throwaway test account. Everything you type into the chat transits the model.

Recipe 3 — Cross-browser sanity pass

“For each of chrome, firefox, and safari: start the browser, go to example.com/pricing, screenshot to /tmp/pricing-{browser}.png, read the text of the .plan-card elements, then close the session. Afterwards, tell me if any browser shows different plan names or prices.”

This is the recipe Playwright MCP can’t fully match — its WebKit is the engine, not Safari-the-app with its autofill, tracking prevention, and rendering quirks. The serial start/close loop matters because the server holds one session at a time. Budget extra seconds for Safari: no headless mode, so a real window opens on your Mac.

Limits

What we got wrong when we first wired this up: we assumed it behaved like Playwright MCP with different branding. It doesn’t, and the differences are where the sharp edges live.

  • Locator guessing is the weak link. The tools take raw locators, and models invent selectors that don’t exist — the failure mode GitHub issue #14 called out (“we should not rely on AI models for locators”). The 10-second default timeout means each bad guess costs 10 seconds before the agent retries. Mitigation: have the agent read accessibility://current first, or paste the relevant HTML into the chat so it picks selectors from ground truth.
  • execute_script and navigate accept anything. An open security report (issue #69) flags unvalidated inputs: a malicious page the agent reads can prompt-inject it into navigating to internal endpoints (SSRF — tricking the browser into reaching services only your machine can see, like cloud metadata IPs) or running attacker-chosen JavaScript. Don’t point this server at untrusted pages from a machine with production access, and use a client that requires per-call approval for execute_script.
  • Local, single-session, stdio-only. One browser session at a time, on the machine running the MCP client. Remote Selenium Grid support is not documented — a closed issue suggests it may work against an available WebDriver endpoint, but there is no official grid-URL option as of writing. Your nightly 200-test grid run is not moving here; this is an interactive tool.
  • Slower per action than CDP-based servers. WebDriver is an HTTP protocol with a round trip per command; Playwright and DevTools MCP speak faster native pipes. Community timing comparisons consistently favor Playwright for raw speed. For interactive agent use the gap rarely matters; for 50-step flows it adds up.
  • It will not replace your test suite. Agent-driven exploration is non-deterministic — the same prompt can take different paths. Explore, reproduce, and draft with it; keep CI on scripted Selenium.

Troubleshooting

start_browser fails / session not created

Almost always a browser/driver mismatch or a missing browser. Confirm the browser is actually installed, update it, and retry — modern Selenium resolves matching drivers automatically via Selenium Manager. On Linux CI boxes, add options: { arguments: ["--no-sandbox"] } for Chrome. For Safari, re-check sudo safaridriver --enable and the Allow Remote Automation setting.

Agent says the Selenium tools aren’t available

Registration didn’t take. Run claude mcp list (or your client’s equivalent) and confirm selenium appears. If it does but tools don’t load, check the client’s MCP log for npx stderr — the usual culprits are an old Node version (use 18+) or a network proxy blocking the npm registry. Pre-install with npm install -g @angiejones/mcp-selenium as a workaround.

Tool names don’t match this guide

You’re on a pre-consolidation version that still exposes find_element, click_element, and friends. Pin @angiejones/mcp-selenium@latest in your config (npx caches aggressively) and restart the client. The old tools still work on old versions — the guide simply describes the current surface.

Element-not-found loops on dynamic pages

The model is guessing selectors on a page it can’t see. Have it read accessibility://current (if your client supports MCP resources), or take a screenshot first, or paste the form’s HTML into the chat. Raising the per-call timeout helps for slow-rendering SPAs, but it can’t fix a selector that never existed.

Edge sessions misbehaving

Edge is the least-exercised of the four targets — there is an open GitHub issue about Edge-specific launch problems. Update Edge and the server first; if it persists, run the same flow in Chrome (same Chromium engine, near-identical behavior for most checks) and file details on the issue.

FAQ

What is the Selenium MCP server?

It is an open-source MCP server by Angie Jones (github.com/angiejones/mcp-selenium, MIT) that exposes Selenium WebDriver as tools an AI agent can call. Your MCP client — Claude Desktop, Claude Code, Cursor, Goose — launches it via npx, and the agent can then start Chrome, Firefox, Edge, or Safari, navigate, click, type, read text, run JavaScript, and take screenshots through plain-English instructions.

How do I install Selenium MCP in Claude Code?

One command: `claude mcp add selenium -- npx -y @angiejones/mcp-selenium@latest`. No API keys or env vars — the server drives browsers already installed on your machine. For Cursor, Windsurf, and Claude Desktop, paste the equivalent JSON block ({"command": "npx", "args": ["-y", "@angiejones/mcp-selenium@latest"]}) into the client's MCP config and restart.

Should I use Selenium MCP or Playwright MCP?

Starting fresh, pick Playwright MCP — it is snapshot-first, Microsoft-maintained, and generally faster per action. Pick Selenium MCP when your team already runs Selenium: the agent uses the same locator strategies (id, css, xpath) as your existing suite, covers Safari via real safaridriver, and what it discovers translates directly into your Java or Python Selenium code.

Does Selenium MCP support Safari?

Yes, on macOS only. Run `sudo safaridriver --enable` once, then enable "Allow Remote Automation" in Safari → Settings → Developer. Safari has no headless mode, so the window is always visible. Chrome, Firefox, and Edge support both headed and headless runs via the start_browser tool's options parameter.

Can Selenium MCP connect to a remote Selenium Grid?

Not as a documented feature. The README only covers local browsers, and the npm package launches drivers on your machine. A closed GitHub issue (#32) suggests pointing start_browser at an available WebDriver endpoint, but there is no documented grid-URL option as of writing. Treat local browser sessions as the supported path and watch the repo's issues for grid support.

Why does my client list tools like find_element and click_element instead of interact?

You are running an older version. The server originally grew past 30 tools, and the maintainer consolidated them (GitHub issue #60) — click, double-click, right-click, and hover merged into one interact tool, and window, frame, alert, cookie, and diagnostics tools were added. Re-run with @angiejones/mcp-selenium@latest to get the current surface described in this guide.

Is the Selenium MCP server free?

Yes. The server is MIT-licensed and runs entirely on your machine — no account, no API key, no usage billing. The only requirements are Node.js (for npx) and the browsers you want to drive. Driver binaries for Chrome, Firefox, and Edge are resolved automatically by modern Selenium via Selenium Manager.

Sources

Found an issue?

If something in this guide is out of date — a renamed tool, a new transport, grid support landing — email [email protected] or read more on our about page. We keep these guides current.