Selenium

Name: Selenium
Rating: 4.8 (189 reviews)
Author: angiejones

Automates web browsers through Selenium WebDriver, allowing AI agents to click buttons, fill forms, navigate pages, and interact with websites programmatically.

Automates web browser actions with Selenium WebDriver.

3731,191 views115Local (stdio)

browser automation

GitHub

What it does

Launch Chrome, Firefox, Edge, or Safari browsers
Navigate to URLs and click elements on web pages
Fill forms and type text into input fields
Extract text content from web page elements
Perform drag-and-drop and hover interactions
Execute right-clicks and double-clicks on elements

Best for

AI agents performing web-based tasks and workflowsAutomated testing of web applicationsWeb scraping and data extraction from interactive sitesBrowser-based automation without manual scripting

Works with major browsers including SafariNo manual scripting required - just tell the AI what to do10+ browser interaction tools

About Selenium

Selenium is a community-built MCP server published by angiejones that provides AI assistants with tools and capabilities via the Model Context Protocol. Automate web browser actions efficiently using Selenium WebDriver for robust testing with Selenium on Python and seamles It is categorized under browser automation. This server exposes 14 tools that AI clients can invoke during conversations and coding sessions.

How to install

You can install Selenium in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Selenium is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Tools (14)

start_browser

launches browser

navigate

navigates to a URL

find_element

finds an element

click_element

clicks an element

send_keys

sends keys to an element, aka typing

MCP Selenium Server

A Model Context Protocol (MCP) server for Selenium WebDriver — browser automation for AI agents.

Setup

Goose (Desktop)

Paste into your browser address bar:

goose://extension?cmd=npx&arg=-y&arg=%40angiejones%2Fmcp-selenium%40latest&id=selenium-mcp&name=Selenium%20MCP&description=automates%20browser%20interactions

Goose (CLI)

goose session --with-extension "npx -y @angiejones/mcp-selenium@latest"

Claude Code

claude mcp add selenium -- npx -y @angiejones/mcp-selenium@latest

Cursor / Windsurf / other MCP clients

{
  "mcpServers": {
    "selenium": {
      "command": "npx",
      "args": ["-y", "@angiejones/mcp-selenium@latest"]
    }
  }
}

Example Usage

Tell the AI agent of your choice:

Open Chrome, go to github.com/angiejones, and take a screenshot.

The agent will call Selenium's APIs to start_browser, navigate, and take_screenshot. No manual scripting or explicit directions needed.

Supported Browsers

Chrome, Firefox, Edge, and Safari.

Safari note: Requires macOS. Run sudo safaridriver --enable once and enable "Allow Remote Automation" in Safari → Settings → Developer. No headless mode.

Tools

start_browser

Launches a browser session.

Parameter	Type	Required	Description
browser	string	Yes	`chrome`, `firefox`, `edge`, or `safari`
options	object	No	`{ headless: boolean, arguments: string[] }`

navigate

Navigates to a URL.

Parameter	Type	Required	Description
url	string	Yes	URL to navigate to

interact

Performs a mouse action on an element.

Parameter	Type	Required	Description
action	string	Yes	`click`, `doubleclick`, `rightclick`, or `hover`
by	string	Yes	Locator strategy: `id`, `css`, `xpath`, `name`, `tag`, `class`
value	string	Yes	Value for the locator strategy
timeout	number	No	Max wait in ms (default: 10000)

send_keys

Types text into an element. Clears the field first.

Parameter	Type	Required	Description
by	string	Yes	Locator strategy
value	string	Yes	Locator value
text	string	Yes	Text to enter
timeout	number	No	Max wait in ms (default: 10000)

get_element_text

Gets the text content of an element.

Parameter	Type	Required	Description
by	string	Yes	Locator strategy
value	string	Yes	Locator value
timeout	number	No	Max wait in ms (default: 10000)

get_element_attribute

Gets an attribute value from an element.

Parameter	Type	Required	Description
by	string	Yes	Locator strategy
value	string	Yes	Locator value
attribute	string	Yes	Attribute name (e.g., `href`, `value`, `class`)
timeout	number	No	Max wait in ms (default: 10000)

press_key

Presses a keyboard key.

Parameter	Type	Required	Description
key	string	Yes	Key to press (e.g., `Enter`, `Tab`, `a`)

upload_file

Uploads a file via a file input element.

Parameter	Type	Required	Description
by	string	Yes	Locator strategy
value	string	Yes	Locator value
filePath	string	Yes	Absolute path to the file
timeout	number	No	Max wait in ms (default: 10000)

take_screenshot

Captures a screenshot of the current page.

Parameter	Type	Required	Description
outputPath	string	No	Save path. If omitted, returns base64 image data.

close_session

Closes the current browser session. No parameters.

execute_script

Executes JavaScript in the browser. Use for advanced interactions not covered by other tools (e.g., drag and drop, scrolling, reading computed styles, DOM manipulation).

Parameter	Type	Required	Description
script	string	Yes	JavaScript code to execute
args	array	No	Arguments accessible via `arguments[0]`, etc.

window

Manages browser windows and tabs.

Parameter	Type	Required	Description
action	string	Yes	`list`, `switch`, `switch_latest`, or `close`
handle	string	No	Window handle (required for `switch`)

frame

Switches focus to a frame or back to the main page.

Parameter	Type	Required	Description
action	string	Yes	`switch` or `default`
by	string	No	Locator strategy (for `switch`)
value	string	No	Locator value (for `switch`)
index	number	No	Frame index, 0-based (for `switch`)
timeout	number	No	Max wait in ms (default: 10000)

alert

Handles browser alert, confirm, or prompt dialogs.

Parameter	Type	Required	Description
action	string	Yes	`accept`, `dismiss`, `get_text`, or `send_text`
text	string	No	Text to send (required for `send_text`)
timeout	number	No	Max wait in ms (default: 5000)

add_cookie

Adds a cookie. Browser must be on a page from the cookie's domain.

Parameter	Type	Required	Description
name	string	Yes	Cookie name
value	string	Yes	Cookie value
domain	string	No	Cookie domain
path	string	No	Cookie path
secure	boolean	No	Secure flag
httpOnly	boolean	No	HTTP-only flag
expiry	number	No	Unix timestamp

get_cookies

Gets cookies. Returns all or a specific one by name.

Parameter	Type	Required	Description
name	string	No	Cookie name. Omit for all cookies.

delete_cookie

Deletes cookies. Deletes all or a specific one by name.

Parameter	Type	Required	Description
name	string	No	Cookie name. Omit to delete all.

diagnostics

Gets browser diagnostics captured via WebDriver BiDi (auto-enabled when supported).

Parameter	Type	Required	Description
type	string	Yes	`console`, `errors`, or `network`
clear	boolean	No	Clear buffer after returning (default: false)

Resources

MCP resources provide read-only data that clients can access without calling a tool.

browser-status://current

Returns the current browser session status (active session ID or "no active session").

Property	Value
MIME type	`text/plain`
Requires browser	No

accessibility://current

Returns an accessibility tree snapshot of the current page — a compact, structured JSON representation of interactive elements and text content. Much smaller than full HTML. Useful for understanding page layout and finding elements to interact with.

Property	Value
MIME type	`application/json`
Requires browser	Yes

Development

Setup

git clone https://github.com/angiejones/mcp-selenium.git
cd mcp-selenium
npm install

Run Tests

npm test

Requires Chrome + chromedriver on PATH. Tests run headless.

Install via Smithery

npx -y @smithery/cli install @angiejones/mcp-selenium --client claude

Install globally

npm install -g @angiejones/mcp-selenium
mcp-selenium

License

MIT

Alternatives

Firecrawl

mendableai

89.6k

Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea

OfficialPopular

3.0k125

Browser Use

browser-use

79.9k

Browser Use lets LLMs and agents access and scrape any website in real time, making web scraping and web page scraping e

OfficialPopular

36616

Playwright Browser Automation

microsoft

28.4k

Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium s

OfficialPopular

7.6k545

Chrome DevTools MCP

chromedevtools

28.1k

AI-driven control of live Chrome via Chrome DevTools: browser automation, debugging, performance analysis and network mo

OfficialPopular

50711

Related Skills

Browse all skills

playwright-pro

Production-grade Playwright testing toolkit. Use when the user mentions Playwright tests, end-to-end testing, browser automation, fixing flaky tests, test migration, CI/CD testing, or test suites. Generate tests, fix flaky failures, migrate from Cypress/Selenium, sync with TestRail, run on BrowserStack. 55 templates, 3 agents, smart reporting.

notebooklm

Query Google NotebookLM for source-grounded, citation-backed answers from uploaded documents. Reduces hallucinations through Gemini's document-only responses. Browser automation with library management and persistent authentication.

144

dev-browser

Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.

chrome-devtools

Browser automation, debugging, and performance analysis using Puppeteer CLI scripts. Use for automating browsers, taking screenshots, analyzing performance, monitoring network traffic, web scraping, form automation, and JavaScript debugging.

qa-tester

"Browser automation QA testing skill. Systematically tests web applications for functionality, security, and usability issues. Reports findings by severity (CRITICAL/HIGH/MEDIUM/LOW) with immediate alerts for critical failures."

browser-automation

Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications. Triggers include "browse", "navigate to", "go to website", "extract data from webpage", "screenshot", "web scraping", "fill out form", "click on", "search for on the web". When taking actions be as specific as possible.