electron-driver

electron-driver

mesomya

Drive Electron apps from AI agents. Click, type, drag, screenshot, eval JS. 38 tools, Playwright-powered automation for

MCP server that enables AI agents to drive and automate Electron applications with 38 tools for clicking, typing, screenshots, JavaScript evaluation, and testing workflows.

1311 viewsLocal (stdio)

About electron-driver

electron-driver is a community-built MCP server published by mesomya that provides AI assistants with tools and capabilities via the Model Context Protocol. Drive Electron apps from AI agents. Click, type, drag, screenshot, eval JS. 38 tools, Playwright-powered automation for It is categorized under productivity. This server exposes 20 tools that AI clients can invoke during conversations and coding sessions.

How to install

You can install electron-driver in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

electron-driver is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Tools (20)

start_app

Launch an Electron application

stop_app

Close the running Electron app

screenshot

Take a full-page PNG screenshot

click

Click on an element

type

Fill text input by replacing existing content

electron-driver

npm version license node version

Drive Electron apps from AI agents. Click, type, drag, screenshot, evaluate JavaScript in the renderer or main process, read console logs, handle multi-window apps, capture accessibility snapshots — all through an MCP (Model Context Protocol) server that plugs into Claude Code, Claude Desktop, Cursor, and any other MCP-compatible agent host.

https://github.com/user-attachments/assets/a95500d2-28d2-4ee1-9965-8f7e1ef54caa

Built on Playwright's experimental _electron API. Works with any Electron app — React, Vue, Svelte, vanilla — as long as you can point it at a compiled main-process entry.

Status: v0.3.0. First public release. 38 tools covering real workflows.

Why this exists

AI agents can reason about what a desktop app should do, but they can't see or interact with one on their own. Web browsers have plenty of agent-automation options; Electron has almost none. This package closes that gap: give an agent the path to your compiled Electron app and it can drive it the same way a human would.

Common use cases:

  • An agent verifies a feature it just implemented by actually running the app and checking the visible result
  • Visual regression testing during a refactor
  • Accessibility audits via ARIA tree snapshots
  • Reproducing bugs from a natural-language description
  • Teaching a subagent to iterate on UI until a spec passes

Install

Requires Node 18+ and an Electron app you've already built.

npm install electron-driver

You don't need to install Playwright browsers separately — _electron drives your Electron binary directly.

Register with your agent host

Claude Code (project scope)

Create .mcp.json at the repo root:

{
  "mcpServers": {
    "electron-driver": {
      "command": "npx",
      "args": ["electron-driver"]
    }
  }
}

Claude Code (user scope — available in every project)

claude mcp add electron-driver --scope user -- npx electron-driver

Claude Desktop / Cursor / other

Add to the host's MCP configuration, pointing at npx electron-driver or the absolute path to node_modules/electron-driver/index.mjs.

Core idea

The server owns exactly one Electron session at a time. start_app launches it, everything else drives it, stop_app closes it. Screenshots go to a session directory that is wiped on every start_app — no pileup, no stale artefacts. All tool calls are logged to <project>/.electron-driver/driver.log during a session.

Errors carry a stable code field so callers can branch programmatically without regex-matching prose:

CodeMeaning
NOT_RUNNINGA tool needs a running session, but there isn't one
ALREADY_RUNNINGstart_app called while a session exists
TIMEOUTAn action hit its timeout
NOT_FOUNDSelector or file didn't match
FILE_NOT_FOUNDPath-based input points at a missing file
BAD_ARGUMENTArguments failed validation
UNKNOWN_TOOLTool name not recognized
ERROREverything else

Tools

All 38 tools grouped by purpose. Every selector-based tool uses Playwright's full selector engine: CSS, text=, role=, [aria-label=], :has-text(), scoping (main >> button), etc.

Lifecycle

start_app — launch the app. Takes main (absolute path to the compiled main entry), optional cwd, args, env, screenshotsDir, timeoutMs. Returns { title, url, viewport, screenshotsDir, logFile }. Detects the single-instance-lock failure mode and gives a helpful hint instead of a raw disconnection error.

stop_app — close cleanly. Safe on an already-stopped session.

info{ title, url, viewport: {width, height, devicePixelRatio}, uptimeMs }. Viewport is populated from window.innerWidth/innerHeight.

Capturing

screenshot — full-page PNG. Pass name (without extension) to control the filename. Returns { path }.

cleanup_screenshots — wipe the current session's screenshot directory.

console_logs — recent renderer console messages (log/info/warn/error/ debug/pageerror) and main-process stdout/stderr. Rolling 1000-entry buffer. Filter by source (renderer/main/all), type, and limit. Pass clear: true to drain after reading.

Interaction

click — click an element. Options: timeoutMs, button (left/right/middle), clickCount, force (skip actionability checks), position (click at an offset inside the element).

typefill a text input, replacing existing content. Fast but only works on real inputs. For editors/CodeMirror/contenteditables, use keyboard_type.

keyboard_type — type as real per-character keydown events. Pass focusSelector to click an element first. Warns in the result if nothing has focus and no focus selector was passed.

press — press a key or chord: "Escape", "Enter", "Control+S", "Shift+Tab", "Control+Shift+P".

press_sequence — alias for keyboard_type with no focus selector.

hover — hover over an element. Options: timeoutMs, force.

drag — drag from one point to another using real Chromium input events (via Playwright's CDP mouse pipeline). Because these are trusted browser events, Chromium's pointer pipeline generates matching PointerEvents as a side effect, so React onPointerDown, native pointerdown listeners, setPointerCapture, CSS :hover/:active, and every other pointer consumer see the drag exactly as if a real user had performed it. Coordinates are CSS pixels. Pass detectSelector and the driver will measure the element before and after the drag and include detect.moved in the result — the only reliable way to catch drags that silently hit a min/max clamp.

{
  "from": { "x": 275, "y": 400 },
  "to":   { "x": 420, "y": 400 },
  "detectSelector": ".sidebar-resize-handle"
}

If the primary strategy does not move the detect target, the driver automatically falls back to invoking the React handler directly via fiber-prop access and dispatching move/up events on both document and window — covering all known React splitter patterns. Disable the fallback with fiberFallback: false. The result includes strategy ("pointer-capture" or "react-fiber").

clear_input — empty an input or textarea.

select_option — select from a <select> by value, label, or index.

check — check a checkbox or radio. Options: timeoutMs, force.

uncheck — uncheck a checkbox.

scroll — scroll a container (pass selector) or the window. Supports absolute (x, y) or delta (dx, dy).

scroll_into_view — ensure an element is visible. Safe if already is.

drop_file — simulate dropping a file onto a target via synthetic DragEvents and a reconstructed File with DataTransfer. Works for apps that read the File via web APIs (FileReader, File.text(), etc). Does not populate file.path — apps relying on webUtils.getPathForFile() must use eval_main to invoke their own IPC handler directly.

set_input_files — the correct way to test file upload UI. Sets files on an <input type="file"> without a native dialog. Much more reliable than drop_file when the app uses real file inputs.

Waiting

wait — fixed pause in milliseconds. Prefer the others.

wait_for_selector — wait until a selector reaches a state (attached/detached/visible/hidden). Honours timeoutMs. Returns count, box, and elapsedMs on success; the error carries elapsed vs requested on timeout.

wait_for — poll a JavaScript predicate (function body, use return) until it returns truthy. Options: timeoutMs, pollMs.

Checking & reading state

exists{ exists, count } fast check, no waiting. Accepts the full selector engine.

get_text — text content of the first match. Accepts the full selector engine. Returns { exists, text }.

get_attribute — read an HTML attribute by name. Returns { exists, value }.

get_value — read an input/textarea/select's current value.

get_bbox — bounding box as { x, y, width, height } in CSS pixels. Use before dragging or clicking at an offset.

get_computed_style — read one or more computed CSS properties. Pass a properties array.

elements_list — enumerate elements matching a selector with their tag, id, classes, text snippet, box, and key attributes. Great for "what buttons exist on this screen". Capped at 50 by default; tune via limit.

focused_element — what currently has focus, with tag/id/classes/text and bounding box. Returns { focused: false } if nothing meaningful has focus.

accessibility_snapshot — capture the ARIA tree as JSON. Useful for a11y audits and finding elements by role. Pass interestingOnly: false to include every node. Pass root to snapshot a subtree.

Multi-window

windows_list — every BrowserWindow the app has open, with id, title, URL, focus/visibility/state flags.

switch_window — route subsequent tool calls to a different window. Pass index or titleMatch.

Dialogs

dialog_handler — install an auto-responder for JavaScript dialogs (alert/confirm/prompt/beforeunload). Pass action: "accept" | "dismiss", optional text for prompt(), and once: true (default) to auto-uninstall after the first dialog.

Evaluation escape hatches

Both eval_renderer and eval_main use the same contract: pass a function body, use return to yield a value, supports async/await, and an optional arg payload is available as the local arg variable.

eval_renderer — evaluate in the renderer (page) context.

{
  "js":

---

*README truncated. [View full README on GitHub](https://github.com/mesomya/electron-driver).*

Alternatives

Related Skills

Browse all skills
ai-assisted-development

Leveraging AI coding assistants and tools to boost development productivity, while maintaining oversight to ensure quality results.

3
teams-channel-post-writer

Creates educational Teams channel posts for internal knowledge sharing about Claude Code features, tools, and best practices. Applies when writing posts, announcements, or documentation to teach colleagues effective Claude Code usage, announce new features, share productivity tips, or document lessons learned. Provides templates, writing guidelines, and structured approaches emphasizing concrete examples, underlying principles, and connections to best practices like context engineering. Activates for content involving Teams posts, channel announcements, feature documentation, or tip sharing.

3
cto-engineering-metrics

Expert methodology for defining, tracking, and interpreting engineering performance metrics including DORA, team health, productivity, and executive reporting.

3
personal-assistant

This skill should be used whenever users request personal assistance tasks such as schedule management, task tracking, reminder setting, habit monitoring, productivity advice, time management, or any query requiring personalized responses based on user preferences and context. On first use, collects comprehensive user information including schedule, working habits, preferences, goals, and routines. Maintains an intelligent database that automatically organizes and prioritizes information, keeping relevant data and discarding outdated context.

3
cursor-local-dev-loop

Optimize local development workflow with Cursor. Triggers on "cursor workflow", "cursor development loop", "cursor productivity", "cursor daily workflow". Use when working with cursor local dev loop functionality. Trigger with phrases like "cursor local dev loop", "cursor loop", "cursor".

2
productivity-helper

Boost your productivity with automated task management

2