browse
Complete guide for creating and deploying browser automation functions using the stagehand CLI
Install
mkdir -p .claude/skills/browse && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3613" && unzip -o skill.zip -d .claude/skills/browse && rm skill.zipInstalls to .claude/skills/browse
About this skill
Browser Automation
Automate browser interactions using the browse CLI with Claude.
Setup check
Before running any browser commands, verify the CLI is available:
which browse || npm install -g @browserbasehq/browse-cli
Environment Selection (Local vs Remote)
The CLI automatically selects between local and remote browser environments based on available configuration:
Local mode (default)
- Uses local Chrome — no API keys needed
- Best for: development, simple pages, trusted sites with no bot protection
Remote mode (Browserbase)
- Activated when
BROWSERBASE_API_KEYandBROWSERBASE_PROJECT_IDare set - Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence
- Use remote mode when: the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access
- Get credentials at https://browserbase.com/settings
When to choose which
- Simple browsing (docs, wikis, public APIs): local mode is fine
- Protected sites (login walls, CAPTCHAs, anti-scraping): use remote mode
- If local mode fails with bot detection or access denied: switch to remote mode
Commands
All commands work identically in both modes. The daemon auto-starts on first command.
Navigation
browse open <url> # Go to URL (aliases: goto)
browse reload # Reload current page
browse back # Go back in history
browse forward # Go forward in history
Page state (prefer snapshot over screenshot)
browse snapshot # Get accessibility tree with element refs (fast, structured)
browse screenshot [path] # Take visual screenshot (slow, uses vision tokens)
browse get url # Get current URL
browse get title # Get page title
browse get text <selector> # Get text content (use "body" for all text)
browse get html <selector> # Get HTML content of element
browse get value <selector> # Get form field value
Use browse snapshot as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use browse screenshot when you need visual context (layout, images, debugging).
Interaction
browse click <ref> # Click element by ref from snapshot (e.g., @0-5)
browse type <text> # Type text into focused element
browse fill <selector> <value> # Fill input and press Enter
browse select <selector> <values...> # Select dropdown option(s)
browse press <key> # Press key (Enter, Tab, Escape, Cmd+A, etc.)
browse drag <fromX> <fromY> <toX> <toY> # Drag from one point to another
browse scroll <x> <y> <deltaX> <deltaY> # Scroll at coordinates
browse highlight <selector> # Highlight element on page
browse is visible <selector> # Check if element is visible
browse is checked <selector> # Check if element is checked
browse wait <type> [arg] # Wait for: load, selector, timeout
Session management
browse stop # Stop the browser daemon
browse status # Check daemon status (includes env)
browse env # Show current environment (local or remote)
browse env local # Switch to local Chrome
browse env remote # Switch to Browserbase (requires API keys)
browse pages # List all open tabs
browse tab_switch <index> # Switch to tab by index
browse tab_close [index] # Close tab
Typical workflow
browse open <url>— navigate to the pagebrowse snapshot— read the accessibility tree to understand page structure and get element refsbrowse click <ref>/browse type <text>/browse fill <selector> <value>— interact using refs from snapshotbrowse snapshot— confirm the action worked- Repeat 3-4 as needed
browse stop— close the browser when done
Quick Example
browse open https://example.com
browse snapshot # see page structure + element refs
browse click @0-5 # click element with ref 0-5
browse get title
browse stop
Mode Comparison
| Feature | Local | Browserbase |
|---|---|---|
| Speed | Faster | Slightly slower |
| Setup | Chrome required | API key required |
| Stealth mode | No | Yes (custom Chromium, anti-bot fingerprinting) |
| CAPTCHA solving | No | Yes (automatic reCAPTCHA/hCaptcha) |
| Residential proxies | No | Yes (201 countries, geo-targeting) |
| Session persistence | No | Yes (cookies/auth persist across sessions) |
| Best for | Development/simple pages | Protected sites, bot detection, production scraping |
Best Practices
- Always
browse openfirst before interacting - Use
browse snapshotto check page state — it's fast and gives you element refs - Only screenshot when visual context is needed (layout checks, images, debugging)
- Use refs from snapshot to click/interact — e.g.,
browse click @0-5 browse stopwhen done to clean up the browser session
Troubleshooting
- "No active page": Run
browse stop, then checkbrowse status. If it still says running, kill the zombie daemon withpkill -f "browse.*daemon", then retrybrowse open - Chrome not found: Install Chrome or use
browse env remote - Action fails: Run
browse snapshotto see available elements and their refs - Browserbase fails: Verify API key and project ID are set
Switching to Remote Mode
Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it.
Don't switch for simple sites (docs, wikis, public APIs, localhost).
browse env remote # switch to Browserbase
browse env local # switch back to local Chrome
The switch is sticky until you run browse stop or switch again.
For detailed examples, see EXAMPLES.md. For API reference, see REFERENCE.md.
More by openclaw
View all skills by openclaw →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversSupercharge your NextJS projects with AI-powered tools for diagnostics, upgrades, and docs. Accelerate development and b
DebuggAI enables zero-config end to end testing for web applications, offering secure tunnels, easy setup, and detailed
Navidrome server streamer for complete music library management with playlists, playback, search, radio, lyrics & Last.f
Raindrop: AI DevOps to convert Claude Code into an infrastructure-as-code full-stack deployment platform, automating app
Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea
Browser Use lets LLMs and agents access and scrape any website in real time, making web scraping and web page scraping e
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.