
electron-driver
Drive Electron apps from AI agents. Click, type, drag, screenshot, eval JS. 38 tools, Playwright-powered automation for
MCP server that enables AI agents to drive and automate Electron applications with 38 tools for clicking, typing, screenshots, JavaScript evaluation, and testing workflows.
About electron-driver
electron-driver is a community-built MCP server published by mesomya that provides AI assistants with tools and capabilities via the Model Context Protocol. Drive Electron apps from AI agents. Click, type, drag, screenshot, eval JS. 38 tools, Playwright-powered automation for It is categorized under productivity. This server exposes 20 tools that AI clients can invoke during conversations and coding sessions.
How to install
You can install electron-driver in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.
License
electron-driver is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.
Tools (20)
Launch an Electron application
Close the running Electron app
Take a full-page PNG screenshot
Click on an element
Fill text input by replacing existing content
electron-driver
Drive Electron apps from AI agents. Click, type, drag, screenshot, evaluate JavaScript in the renderer or main process, read console logs, handle multi-window apps, capture accessibility snapshots — all through an MCP (Model Context Protocol) server that plugs into Claude Code, Claude Desktop, Cursor, and any other MCP-compatible agent host.
https://github.com/user-attachments/assets/a95500d2-28d2-4ee1-9965-8f7e1ef54caa
Built on Playwright's experimental _electron API. Works with any Electron
app — React, Vue, Svelte, vanilla — as long as you can point it at a
compiled main-process entry.
Status: v0.3.0. First public release. 38 tools covering real workflows.
Why this exists
AI agents can reason about what a desktop app should do, but they can't see or interact with one on their own. Web browsers have plenty of agent-automation options; Electron has almost none. This package closes that gap: give an agent the path to your compiled Electron app and it can drive it the same way a human would.
Common use cases:
- An agent verifies a feature it just implemented by actually running the app and checking the visible result
- Visual regression testing during a refactor
- Accessibility audits via ARIA tree snapshots
- Reproducing bugs from a natural-language description
- Teaching a subagent to iterate on UI until a spec passes
Install
Requires Node 18+ and an Electron app you've already built.
npm install electron-driver
You don't need to install Playwright browsers separately — _electron
drives your Electron binary directly.
Register with your agent host
Claude Code (project scope)
Create .mcp.json at the repo root:
{
"mcpServers": {
"electron-driver": {
"command": "npx",
"args": ["electron-driver"]
}
}
}
Claude Code (user scope — available in every project)
claude mcp add electron-driver --scope user -- npx electron-driver
Claude Desktop / Cursor / other
Add to the host's MCP configuration, pointing at npx electron-driver or
the absolute path to node_modules/electron-driver/index.mjs.
Core idea
The server owns exactly one Electron session at a time. start_app
launches it, everything else drives it, stop_app closes it. Screenshots
go to a session directory that is wiped on every start_app — no pileup,
no stale artefacts. All tool calls are logged to
<project>/.electron-driver/driver.log during a session.
Errors carry a stable code field so callers can branch programmatically
without regex-matching prose:
| Code | Meaning |
|---|---|
NOT_RUNNING | A tool needs a running session, but there isn't one |
ALREADY_RUNNING | start_app called while a session exists |
TIMEOUT | An action hit its timeout |
NOT_FOUND | Selector or file didn't match |
FILE_NOT_FOUND | Path-based input points at a missing file |
BAD_ARGUMENT | Arguments failed validation |
UNKNOWN_TOOL | Tool name not recognized |
ERROR | Everything else |
Tools
All 38 tools grouped by purpose. Every selector-based tool uses
Playwright's full selector engine: CSS, text=, role=, [aria-label=],
:has-text(), scoping (main >> button), etc.
Lifecycle
start_app — launch the app. Takes main (absolute path to the
compiled main entry), optional cwd, args, env, screenshotsDir,
timeoutMs. Returns { title, url, viewport, screenshotsDir, logFile }.
Detects the single-instance-lock failure mode and gives a helpful hint
instead of a raw disconnection error.
stop_app — close cleanly. Safe on an already-stopped session.
info — { title, url, viewport: {width, height, devicePixelRatio}, uptimeMs }.
Viewport is populated from window.innerWidth/innerHeight.
Capturing
screenshot — full-page PNG. Pass name (without extension) to
control the filename. Returns { path }.
cleanup_screenshots — wipe the current session's screenshot directory.
console_logs — recent renderer console messages (log/info/warn/error/
debug/pageerror) and main-process stdout/stderr. Rolling 1000-entry buffer.
Filter by source (renderer/main/all), type, and limit. Pass
clear: true to drain after reading.
Interaction
click — click an element. Options: timeoutMs, button
(left/right/middle), clickCount, force (skip actionability
checks), position (click at an offset inside the element).
type — fill a text input, replacing existing content. Fast but
only works on real inputs. For editors/CodeMirror/contenteditables, use
keyboard_type.
keyboard_type — type as real per-character keydown events. Pass
focusSelector to click an element first. Warns in the result if nothing
has focus and no focus selector was passed.
press — press a key or chord: "Escape", "Enter", "Control+S",
"Shift+Tab", "Control+Shift+P".
press_sequence — alias for keyboard_type with no focus selector.
hover — hover over an element. Options: timeoutMs, force.
drag — drag from one point to another using real Chromium input
events (via Playwright's CDP mouse pipeline). Because these are trusted
browser events, Chromium's pointer pipeline generates matching
PointerEvents as a side effect, so React onPointerDown, native
pointerdown listeners, setPointerCapture, CSS :hover/:active, and
every other pointer consumer see the drag exactly as if a real user had
performed it. Coordinates are CSS pixels. Pass detectSelector and the
driver will measure the element before and after the drag and include
detect.moved in the result — the only reliable way to catch drags that
silently hit a min/max clamp.
{
"from": { "x": 275, "y": 400 },
"to": { "x": 420, "y": 400 },
"detectSelector": ".sidebar-resize-handle"
}
If the primary strategy does not move the detect target, the driver
automatically falls back to invoking the React handler directly via
fiber-prop access and dispatching move/up events on both document and
window — covering all known React splitter patterns. Disable the
fallback with fiberFallback: false. The result includes strategy
("pointer-capture" or "react-fiber").
clear_input — empty an input or textarea.
select_option — select from a <select> by value, label, or
index.
check — check a checkbox or radio. Options: timeoutMs, force.
uncheck — uncheck a checkbox.
scroll — scroll a container (pass selector) or the window. Supports
absolute (x, y) or delta (dx, dy).
scroll_into_view — ensure an element is visible. Safe if already is.
drop_file — simulate dropping a file onto a target via synthetic
DragEvents and a reconstructed File with DataTransfer. Works for apps
that read the File via web APIs (FileReader, File.text(), etc). Does
not populate file.path — apps relying on webUtils.getPathForFile()
must use eval_main to invoke their own IPC handler directly.
set_input_files — the correct way to test file upload UI. Sets files
on an <input type="file"> without a native dialog. Much more reliable
than drop_file when the app uses real file inputs.
Waiting
wait — fixed pause in milliseconds. Prefer the others.
wait_for_selector — wait until a selector reaches a state
(attached/detached/visible/hidden). Honours timeoutMs. Returns
count, box, and elapsedMs on success; the error carries
elapsed vs requested on timeout.
wait_for — poll a JavaScript predicate (function body, use return)
until it returns truthy. Options: timeoutMs, pollMs.
Checking & reading state
exists — { exists, count } fast check, no waiting. Accepts the full
selector engine.
get_text — text content of the first match. Accepts the full selector
engine. Returns { exists, text }.
get_attribute — read an HTML attribute by name. Returns
{ exists, value }.
get_value — read an input/textarea/select's current value.
get_bbox — bounding box as { x, y, width, height } in CSS pixels.
Use before dragging or clicking at an offset.
get_computed_style — read one or more computed CSS properties. Pass
a properties array.
elements_list — enumerate elements matching a selector with their
tag, id, classes, text snippet, box, and key attributes. Great for "what
buttons exist on this screen". Capped at 50 by default; tune via limit.
focused_element — what currently has focus, with tag/id/classes/text
and bounding box. Returns { focused: false } if nothing meaningful has
focus.
accessibility_snapshot — capture the ARIA tree as JSON. Useful for a11y
audits and finding elements by role. Pass interestingOnly: false to
include every node. Pass root to snapshot a subtree.
Multi-window
windows_list — every BrowserWindow the app has open, with id,
title, URL, focus/visibility/state flags.
switch_window — route subsequent tool calls to a different window.
Pass index or titleMatch.
Dialogs
dialog_handler — install an auto-responder for JavaScript dialogs
(alert/confirm/prompt/beforeunload). Pass action: "accept" | "dismiss",
optional text for prompt(), and once: true (default) to
auto-uninstall after the first dialog.
Evaluation escape hatches
Both eval_renderer and eval_main use the same contract: pass a
function body, use return to yield a value, supports async/await,
and an optional arg payload is available as the local arg variable.
eval_renderer — evaluate in the renderer (page) context.
{
"js":
---
*README truncated. [View full README on GitHub](https://github.com/mesomya/electron-driver).*
Alternatives
Related Skills
Browse all skillsLeveraging AI coding assistants and tools to boost development productivity, while maintaining oversight to ensure quality results.
Creates educational Teams channel posts for internal knowledge sharing about Claude Code features, tools, and best practices. Applies when writing posts, announcements, or documentation to teach colleagues effective Claude Code usage, announce new features, share productivity tips, or document lessons learned. Provides templates, writing guidelines, and structured approaches emphasizing concrete examples, underlying principles, and connections to best practices like context engineering. Activates for content involving Teams posts, channel announcements, feature documentation, or tip sharing.
Expert methodology for defining, tracking, and interpreting engineering performance metrics including DORA, team health, productivity, and executive reporting.
This skill should be used whenever users request personal assistance tasks such as schedule management, task tracking, reminder setting, habit monitoring, productivity advice, time management, or any query requiring personalized responses based on user preferences and context. On first use, collects comprehensive user information including schedule, working habits, preferences, goals, and routines. Maintains an intelligent database that automatically organizes and prioritizes information, keeping relevant data and discarding outdated context.
Optimize local development workflow with Cursor. Triggers on "cursor workflow", "cursor development loop", "cursor productivity", "cursor daily workflow". Use when working with cursor local dev loop functionality. Trigger with phrases like "cursor local dev loop", "cursor loop", "cursor".
Boost your productivity with automated task management