browse

10views

1installs

Complete guide for creating and deploying browser automation functions using the stagehand CLI

Install

mkdir -p .claude/skills/browse && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3613" && unzip -o skill.zip -d .claude/skills/browse && rm skill.zip

Installs to .claude/skills/browse

About this skill

Browser Automation

Automate browser interactions using the browse CLI with Claude.

Setup check

Before running any browser commands, verify the CLI is available:

which browse || npm install -g @browserbasehq/browse-cli

Environment Selection (Local vs Remote)

The CLI automatically selects between local and remote browser environments based on available configuration:

Local mode (default)

Uses local Chrome — no API keys needed
Best for: development, simple pages, trusted sites with no bot protection

Remote mode (Browserbase)

Activated when BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID are set
Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence
Use remote mode when: the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access
Get credentials at https://browserbase.com/settings

When to choose which

Simple browsing (docs, wikis, public APIs): local mode is fine
Protected sites (login walls, CAPTCHAs, anti-scraping): use remote mode
If local mode fails with bot detection or access denied: switch to remote mode

Commands

All commands work identically in both modes. The daemon auto-starts on first command.

Navigation

browse open <url>                        # Go to URL (aliases: goto)
browse reload                            # Reload current page
browse back                              # Go back in history
browse forward                           # Go forward in history

Page state (prefer snapshot over screenshot)

browse snapshot                          # Get accessibility tree with element refs (fast, structured)
browse screenshot [path]                 # Take visual screenshot (slow, uses vision tokens)
browse get url                           # Get current URL
browse get title                         # Get page title
browse get text <selector>               # Get text content (use "body" for all text)
browse get html <selector>               # Get HTML content of element
browse get value <selector>              # Get form field value

Use browse snapshot as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use browse screenshot when you need visual context (layout, images, debugging).

Interaction

browse click <ref>                       # Click element by ref from snapshot (e.g., @0-5)
browse type <text>                       # Type text into focused element
browse fill <selector> <value>           # Fill input and press Enter
browse select <selector> <values...>     # Select dropdown option(s)
browse press <key>                       # Press key (Enter, Tab, Escape, Cmd+A, etc.)
browse drag <fromX> <fromY> <toX> <toY>  # Drag from one point to another
browse scroll <x> <y> <deltaX> <deltaY> # Scroll at coordinates
browse highlight <selector>              # Highlight element on page
browse is visible <selector>             # Check if element is visible
browse is checked <selector>             # Check if element is checked
browse wait <type> [arg]                 # Wait for: load, selector, timeout

Session management

browse stop                              # Stop the browser daemon
browse status                            # Check daemon status (includes env)
browse env                               # Show current environment (local or remote)
browse env local                         # Switch to local Chrome
browse env remote                        # Switch to Browserbase (requires API keys)
browse pages                             # List all open tabs
browse tab_switch <index>                # Switch to tab by index
browse tab_close [index]                 # Close tab

Typical workflow

browse open <url> — navigate to the page
browse snapshot — read the accessibility tree to understand page structure and get element refs
browse click <ref> / browse type <text> / browse fill <selector> <value> — interact using refs from snapshot
browse snapshot — confirm the action worked
Repeat 3-4 as needed
browse stop — close the browser when done

Quick Example

browse open https://example.com
browse snapshot                          # see page structure + element refs
browse click @0-5                        # click element with ref 0-5
browse get title
browse stop

Mode Comparison

Feature	Local	Browserbase
Speed	Faster	Slightly slower
Setup	Chrome required	API key required
Stealth mode	No	Yes (custom Chromium, anti-bot fingerprinting)
CAPTCHA solving	No	Yes (automatic reCAPTCHA/hCaptcha)
Residential proxies	No	Yes (201 countries, geo-targeting)
Session persistence	No	Yes (cookies/auth persist across sessions)
Best for	Development/simple pages	Protected sites, bot detection, production scraping

Best Practices

Always browse open first before interacting
Use browse snapshot to check page state — it's fast and gives you element refs
Only screenshot when visual context is needed (layout checks, images, debugging)
Use refs from snapshot to click/interact — e.g., browse click @0-5
browse stop when done to clean up the browser session

Troubleshooting

"No active page": Run browse stop, then check browse status. If it still says running, kill the zombie daemon with pkill -f "browse.*daemon", then retry browse open
Chrome not found: Install Chrome or use browse env remote
Action fails: Run browse snapshot to see available elements and their refs
Browserbase fails: Verify API key and project ID are set

Switching to Remote Mode

Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it.

Don't switch for simple sites (docs, wikis, public APIs, localhost).

browse env remote            # switch to Browserbase
browse env local             # switch back to local Chrome

The switch is sticky until you run browse stop or switch again.

For detailed examples, see EXAMPLES.md. For API reference, see REFERENCE.md.

More by openclaw

View all skills by openclaw →

a-stock-analysis

openclaw

A股实时行情与分时量能分析。获取沪深股票实时价格、涨跌、成交量，分析分时量能分布（早盘/尾盘放量）、主力动向（抢筹/出货信号）、涨停封单。支持持仓管理和盈亏分析。Use when: (1) 查询A股实时行情, (2) 分析主力资金动向, (3) 查看分时成交量分布, (4) 管理股票持仓, (5) 分析持仓盈亏。

317125

research-paper-writer

openclaw

Creates formal academic research papers following IEEE/ACM formatting standards with proper structure, citations, and scholarly writing style. Use when the user asks to write a research paper, academic paper, or conference paper on any topic.

4774

gog

openclaw

Google Workspace CLI for Gmail, Calendar, Drive, Contacts, Sheets, and Docs.

16470

seedream-image-gen

openclaw

Generate images via Seedream API (doubao-seedream models). Synchronous generation.

4062

weread

openclaw

WeChat Reading (微信读书) CLI tool for fetching notes and highlights. Use when: (1) user asks about weread/微信读书 notes or highlights, (2) fetching today's or recent reading notes, (3) exporting book highlights, (4) managing reading bookshelf, (5) any task involving reading notes from WeChat Reading.

5061

keyword-research

openclaw

Discovers high-value keywords with search intent analysis, difficulty assessment, and content opportunity mapping. Essential for starting any SEO or GEO content strategy.

28057

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6881,430

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2721,337

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5471,153

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,359809

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,269732

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,498687

Related MCP Servers

Browse all servers

NextJS

Supercharge your NextJS projects with AI-powered tools for diagnostics, upgrades, and docs. Accelerate development and b

6657 tools

DebuggAI

DebuggAI enables zero-config end to end testing for web applications, offering secure tunnels, easy setup, and detailed

910 tools

Navidrome

Navidrome server streamer for complete music library management with playlists, playback, search, radio, lyrics & Last.f

540 tools

Raindrop

Raindrop: AI DevOps to convert Claude Code into an infrastructure-as-code full-stack deployment platform, automating app

49 tools

Firecrawl

Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea

89,5930 tools

Browser Use

Browser Use lets LLMs and agents access and scrape any website in real time, making web scraping and web page scraping e

79,9420 tools

Install

mkdir -p .claude/skills/browse && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3613" && unzip -o skill.zip -d .claude/skills/browse && rm skill.zip

Installs to .claude/skills/browse

Stats

Views

Installs

Author

openclaw

7 skills published

Links

Source Code

browse

Install

About this skill

Browser Automation

Setup check

Environment Selection (Local vs Remote)

Local mode (default)

Remote mode (Browserbase)

When to choose which

Commands

Navigation

Page state (prefer snapshot over screenshot)

Interaction

Session management

Typical workflow

Quick Example

Mode Comparison

Best Practices

Troubleshooting

Switching to Remote Mode

More by openclaw

a-stock-analysis

research-paper-writer

gog

seedream-image-gen

weread

keyword-research

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers