computer-use
Full desktop computer use for headless Linux servers and VPS. Creates a virtual display (Xvfb + XFCE) to control GUI applications without a physical monitor. Screenshots, mouse clicks, keyboard input, scrolling, dragging — all 17 standard actions. Includes VNC setup for live remote viewing and interaction. Model-agnostic, works with any LLM.
Install
mkdir -p .claude/skills/computer-use && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8237" && unzip -o skill.zip -d .claude/skills/computer-use && rm skill.zipInstalls to .claude/skills/computer-use
About this skill
Computer Use Skill
Full desktop GUI control for headless Linux servers. Creates a virtual display (Xvfb + XFCE) so you can run and control desktop applications on VPS/cloud instances without a physical monitor.
Environment
- Display:
:99 - Resolution: 1024x768 (XGA, Anthropic recommended)
- Desktop: XFCE4 (minimal — xfwm4 + panel only)
Quick Setup
Run the setup script to install everything (systemd services, flicker-free VNC):
./scripts/setup-vnc.sh
This installs:
- Xvfb virtual display on
:99 - Minimal XFCE desktop (xfwm4 + panel, no xfdesktop)
- x11vnc with stability flags
- noVNC for browser access
All services auto-start on boot and auto-restart on crash.
Actions Reference
| Action | Script | Arguments | Description |
|---|---|---|---|
| screenshot | screenshot.sh | — | Capture screen → base64 PNG |
| cursor_position | cursor_position.sh | — | Get current mouse X,Y |
| mouse_move | mouse_move.sh | x y | Move mouse to coordinates |
| left_click | click.sh | x y left | Left click at coordinates |
| right_click | click.sh | x y right | Right click |
| middle_click | click.sh | x y middle | Middle click |
| double_click | click.sh | x y double | Double click |
| triple_click | click.sh | x y triple | Triple click (select line) |
| left_click_drag | drag.sh | x1 y1 x2 y2 | Drag from start to end |
| left_mouse_down | mouse_down.sh | — | Press mouse button |
| left_mouse_up | mouse_up.sh | — | Release mouse button |
| type | type_text.sh | "text" | Type text (50 char chunks, 12ms delay) |
| key | key.sh | "combo" | Press key (Return, ctrl+c, alt+F4) |
| hold_key | hold_key.sh | "key" secs | Hold key for duration |
| scroll | scroll.sh | dir amt [x y] | Scroll up/down/left/right |
| wait | wait.sh | seconds | Wait then screenshot |
| zoom | zoom.sh | x1 y1 x2 y2 | Cropped region screenshot |
Usage Examples
export DISPLAY=:99
# Take screenshot
./scripts/screenshot.sh
# Click at coordinates
./scripts/click.sh 512 384 left
# Type text
./scripts/type_text.sh "Hello world"
# Press key combo
./scripts/key.sh "ctrl+s"
# Scroll down
./scripts/scroll.sh down 5
Workflow Pattern
- Screenshot — Always start by seeing the screen
- Analyze — Identify UI elements and coordinates
- Act — Click, type, scroll
- Screenshot — Verify result
- Repeat
Tips
- Screen is 1024x768, origin (0,0) at top-left
- Click to focus before typing in text fields
- Use
ctrl+Endto jump to page bottom in browsers - Most actions auto-screenshot after 2 sec delay
- Long text is chunked (50 chars) with 12ms keystroke delay
Live Desktop Viewing (VNC)
Watch the desktop in real-time via browser or VNC client.
Connect via Browser
# SSH tunnel (run on your local machine)
ssh -L 6080:localhost:6080 your-server
# Open in browser
http://localhost:6080/vnc.html
Connect via VNC Client
# SSH tunnel
ssh -L 5900:localhost:5900 your-server
# Connect VNC client to localhost:5900
SSH Config (recommended)
Add to ~/.ssh/config for automatic tunneling:
Host your-server
HostName your.server.ip
User your-user
LocalForward 6080 127.0.0.1:6080
LocalForward 5900 127.0.0.1:5900
Then just ssh your-server and VNC is available.
System Services
# Check status
systemctl status xvfb xfce-minimal x11vnc novnc
# Restart if needed
sudo systemctl restart xvfb xfce-minimal x11vnc novnc
Service Chain
xvfb → xfce-minimal → x11vnc → novnc
- xvfb: Virtual display :99 (1024x768x24)
- xfce-minimal: Watchdog that runs xfwm4+panel, kills xfdesktop
- x11vnc: VNC server with
-noxdamagefor stability - novnc: WebSocket proxy with heartbeat for connection stability
Opening Applications
export DISPLAY=:99
# Chrome — only use --no-sandbox if the kernel lacks user namespace support.
# Check: cat /proc/sys/kernel/unprivileged_userns_clone
# 1 = sandbox works, do NOT use --no-sandbox
# 0 = sandbox fails, --no-sandbox required as fallback
# Using --no-sandbox when unnecessary causes instability and crashes.
if [ "$(cat /proc/sys/kernel/unprivileged_userns_clone 2>/dev/null)" = "0" ]; then
google-chrome --no-sandbox &
else
google-chrome &
fi
xfce4-terminal & # Terminal
thunar & # File manager
Note: Snap browsers (Firefox, Chromium) have sandbox issues on headless servers. Use Chrome .deb instead:
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f
Manual Setup
If you prefer manual setup instead of setup-vnc.sh:
# Install packages
sudo apt install -y xvfb xfce4 xfce4-terminal xdotool scrot imagemagick dbus-x11 x11vnc novnc websockify
# Run the setup script (generates systemd services, masks xfdesktop, starts everything)
./scripts/setup-vnc.sh
If you prefer fully manual setup, the setup-vnc.sh script generates all systemd service files inline -- read it for the exact service definitions.
Troubleshooting
VNC shows black screen
- Check if xfwm4 is running:
pgrep xfwm4 - Restart desktop:
sudo systemctl restart xfce-minimal
VNC flickering/flashing
- Ensure xfdesktop is masked (check
/usr/bin/xfdesktop) - xfdesktop causes flicker due to clear→draw cycles on Xvfb
VNC disconnects frequently
- Check noVNC has
--heartbeat 30flag - Check x11vnc has
-noxdamageflag
x11vnc crashes (SIGSEGV)
- Add
-noxdamage -noxfixesflags - The DAMAGE extension causes crashes on Xvfb
Requirements
Installed by setup-vnc.sh:
xvfb xfce4 xfce4-terminal xdotool scrot imagemagick dbus-x11 x11vnc novnc websockify
More by openclaw
View all skills by openclaw →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversMCP server for full computer use on Windows. Control mouse, keyboard, take screenshots, manage windows, and automate des
Automate desktop tasks with Computer Control: mouse, keyboard, screenshots, OCR & window management. Power Automate Desk
Glasses automates website screenshot capture with headless Chrome, offering device emulation and flexible formats for we
Play retro computer games online like Prince, Dune, and Mario Teaches Typing DOS with a web-based emulator for classic D
Mastra Docs: AI assistants with direct access to Mastra.ai’s full knowledge base for faster, smarter support and insight
Serena is a free AI code generator toolkit providing robust code editing and retrieval, turning LLMs into powerful artif
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.