Computer Control

Name: Computer Control
Rating: 4.6 (124 reviews)
Author: ab498

Provides desktop automation capabilities including mouse control, keyboard input, screen capture, and OCR text recognition. Works directly with your computer's GUI without external dependencies.

Enables desktop automation through mouse control, keyboard input, screenshots, OCR, and window management for direct interaction with graphical user interfaces

120436 views18Local (stdio)

developer tools

GitHub

What it does

Control mouse movements and clicks
Send keyboard input and keystrokes
Take desktop screenshots
Extract text from images using OCR
Manage windows and applications
Automate GUI interactions

Best for

Desktop automation and testingGUI workflow automationScreen scraping and data extractionRepetitive task automation

Zero external dependenciesBuilt-in OCR capabilityDirect GUI interaction

About Computer Control

Computer Control is a community-built MCP server published by ab498 that provides AI assistants with tools and capabilities via the Model Context Protocol. Automate desktop tasks with Computer Control: mouse, keyboard, screenshots, OCR & window management. Power Automate Desk It is categorized under developer tools.

How to install

You can install Computer Control in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Computer Control is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Computer Control MCP

MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies.

    Link to nextjs-boilerplate-ashy-nine-64.vercel.app

MCP Computer Control Demo

Quick Usage (MCP Setup Using `uvx`)

Note: Running uvx computer-control-mcp@latest for the first time will download python dependencies (around 70MB) which may take some time. Recommended to run this in a terminal before using it as MCP. Subsequent runs will be instant.

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "uvx",
      "args": ["computer-control-mcp@latest"]
    }
  }
}

OR install globally with pip:

pip install computer-control-mcp

Then run the server with:

computer-control-mcp # instead of uvx computer-control-mcp, so you can use the latest version, also you can `uv cache clean` to clear the cache and `uvx` again to use latest version.

Features

Control mouse movements and clicks
Type text at the current cursor position
Take screenshots of the entire screen or specific windows with optional saving to downloads directory
Extract text from screenshots using OCR (Optical Character Recognition)
List and activate windows
Press keyboard keys
Drag and drop operations
Enhanced screenshot capture for GPU-accelerated windows (Windows only)

Note on GPU-accelerated Windows

Traditional screenshot methods like GDI/PrintWindow fail to capture GPU-accelerated windows, resulting in black screens. This impacts games, media players, Electron apps, browsers with GPU acceleration, streaming software, and CAD tools. Use WGC through take_screenshot tool's flag or ENV variable

Configuration

Custom Screenshot Directory

By default, screenshots are saved to the OS downloads directory. You can customize this by setting the COMPUTER_CONTROL_MCP_SCREENSHOT_DIR environment variable:

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "uvx",
      "args": ["computer-control-mcp@latest"],
      "env": {
        "COMPUTER_CONTROL_MCP_SCREENSHOT_DIR": "C:\\Users\\YourName\\Pictures\\Screenshots"
      }
    }
  }
}

Or set it system-wide:

# Windows (PowerShell)
$env:COMPUTER_CONTROL_MCP_SCREENSHOT_DIR = "C:\Users\YourName\Pictures\Screenshots"

# macOS/Linux
export COMPUTER_CONTROL_MCP_SCREENSHOT_DIR="/home/yourname/Pictures/Screenshots"

If the specified directory doesn't exist, the server will fall back to the default downloads directory.

Automatic WGC for Specific Windows

You can configure the system to automatically use Windows Graphics Capture (WGC) for specific windows by setting the COMPUTER_CONTROL_MCP_WGC_PATTERNS environment variable. This variable should contain comma-separated patterns that match window titles:

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "uvx",
      "args": ["computer-control-mcp@latest"],
      "env": {
        "COMPUTER_CONTROL_MCP_WGC_PATTERNS": "obs, discord, game, steam"
      }
    }
  }
}

Or set it system-wide:

# Windows (PowerShell)
$env:COMPUTER_CONTROL_MCP_WGC_PATTERNS = "obs, discord, game, steam"

# macOS/Linux
export COMPUTER_CONTROL_MCP_WGC_PATTERNS="obs, discord, game, steam"

When this variable is set, any window whose title contains any of the specified patterns will automatically use WGC for screenshot capture, eliminating black screens for GPU-accelerated applications.

Available Tools

Mouse Control

click_screen(x: int, y: int): Click at specified screen coordinates
move_mouse(x: int, y: int): Move mouse cursor to specified coordinates
drag_mouse(from_x: int, from_y: int, to_x: int, to_y: int, duration: float = 0.5): Drag mouse from one position to another
mouse_down(button: str = "left"): Hold down a mouse button ('left', 'right', 'middle')
mouse_up(button: str = "left"): Release a mouse button ('left', 'right', 'middle')

Keyboard Control

type_text(text: str): Type the specified text at current cursor position
press_key(key: str): Press a specified keyboard key
key_down(key: str): Hold down a specific keyboard key until released
key_up(key: str): Release a specific keyboard key
press_keys(keys: Union[str, List[Union[str, List[str]]]]): Press keyboard keys (supports single keys, sequences, and combinations)

Screen and Window Management

take_screenshot(title_pattern: str = None, use_regex: bool = False, threshold: int = 60, scale_percent_for_ocr: int = None, save_to_downloads: bool = False, use_wgc: bool = False): Capture screen or window
take_screenshot_with_ocr(title_pattern: str = None, use_regex: bool = False, threshold: int = 10, scale_percent_for_ocr: int = None, save_to_downloads: bool = False): Extract adn return text with coordinates using OCR from screen or window
get_screen_size(): Get current screen resolution
list_windows(): List all open windows
activate_window(title_pattern: str, use_regex: bool = False, threshold: int = 60): Bring specified window to foreground
wait_milliseconds(milliseconds: int): Wait for a specified number of milliseconds

Development

Setting up the Development Environment

# Clone the repository
git clone https://github.com/AB498/computer-control-mcp.git
cd computer-control-mcp

# Build/Run:

# 1. Install in development mode | Meaning that your edits to source code will be reflected in the installed package.
pip install -e .

# Then Start server | This is equivalent to `uvx computer-control-mcp@latest` just the local code is used
computer-control-mcp

# -- OR --

# 2. Build after `pip install hatch` | This needs version increment in orer to reflect code changes
hatch build

# Windows
$latest = Get-ChildItem .\dist\*.whl | Sort-Object LastWriteTime -Descending | Select-Object -First 1
pip install $latest.FullName --upgrade 

# Non-windows
pip install dist/*.whl --upgrade

# Run
computer-control-mcp

Running Tests

python -m pytest

API Reference

See the API Reference for detailed information about the available functions and classes.

License

MIT

For more information or help

Alternatives

Chrome DevTools MCP

chromedevtools

28.1k

AI-driven control of live Chrome via Chrome DevTools: browser automation, debugging, performance analysis and network mo

OfficialPopular

50711

Chrome DevTools

chromedevtools

28.1k

Use Chrome DevTools for web site test speed, debugging, and performance analysis. The essential chrome developer tools f

OfficialPopular

3.9k172

GitHub

github

27.6k

Extend your developer tools with GitHub MCP Server for advanced automation, supporting GitHub Student and student packag

OfficialRemotePopular

4.5k232

Repomix

yamadashy

22.3k

Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool

OfficialPopular

1.0k7

Related Skills

Browse all skills

pmbok-project-management

Comprehensive PMP/PMBOK project management methodologies and best practices. Use this skill when users need guidance on project management processes, templates, knowledge areas, process groups, tools, techniques, or certification preparation. Covers all 10 PMBOK Knowledge Areas and 5 Process Groups with practical templates, frameworks, and industry-standard approaches. Includes risk management, stakeholder engagement, schedule management, cost control, quality assurance, and resource planning.

ui-design-system

UI design system toolkit for Senior UI Designer including design token generation, component documentation, responsive design calculations, and developer handoff tools. Use for creating design systems, maintaining visual consistency, and facilitating design-dev collaboration.

computer-use-agents

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.

computer-use

Full desktop computer use for headless Linux servers and VPS. Creates a virtual display (Xvfb + XFCE) to control GUI applications without a physical monitor. Screenshots, mouse clicks, keyboard input, scrolling, dragging — all 17 standard actions. Includes VNC setup for live remote viewing and interaction. Model-agnostic, works with any LLM.

ai-sdk

Answer questions about the AI SDK and help build AI-powered features. Use when developers: (1) Ask about AI SDK functions like generateText, streamText, ToolLoopAgent, embed, or tools, (2) Want to build AI agents, chatbots, RAG systems, or text generation features, (3) Have questions about AI providers (OpenAI, Anthropic, Google, etc.), streaming, tool calling, structured output, or embeddings, (4) Use React hooks like useChat or useCompletion. Triggers on: "AI SDK", "Vercel AI SDK", "generateText", "streamText", "add AI to my app", "build an agent", "tool calling", "structured output", "useChat".

desktop

Electron desktop development guide. Use when implementing desktop features, IPC handlers, controllers, preload scripts, window management, menu configuration, or Electron-specific functionality. Triggers on desktop app development, Electron IPC, or desktop local tools implementation.

What it does

Best for

About Computer Control

How to install

License

Computer Control MCP

MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies.

Quick Usage (MCP Setup Using uvx)

Features

Note on GPU-accelerated Windows

Configuration

Custom Screenshot Directory

Automatic WGC for Specific Windows

Available Tools

Mouse Control

Keyboard Control

Screen and Window Management

Development

Setting up the Development Environment

Running Tests

API Reference

License

For more information or help

Alternatives

Chrome DevTools MCP

Chrome DevTools

GitHub

Repomix

Related Skills

Quick Usage (MCP Setup Using `uvx`)