browser-use

143
44
Source

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.

Install

mkdir -p .claude/skills/browser-use && curl -L -o skill.zip "https://mcp.directory/api/skills/download/560" && unzip -o skill.zip -d .claude/skills/browser-use && rm skill.zip

Installs to .claude/skills/browser-use

About this skill

Browser Automation with browser-use CLI

The browser-use command provides fast, persistent browser automation. It maintains browser sessions across commands, enabling complex multi-step workflows.

Installation

# Run without installing (recommended for one-off use)
uvx "browser-use[cli]" open https://example.com

# Or install permanently
uv pip install "browser-use[cli]"

# Install browser dependencies (Chromium)
browser-use install

Quick Start

browser-use open https://example.com           # Navigate to URL
browser-use state                              # Get page elements with indices
browser-use click 5                            # Click element by index
browser-use type "Hello World"                 # Type text
browser-use screenshot                         # Take screenshot
browser-use close                              # Close browser

Core Workflow

  1. Navigate: browser-use open <url> - Opens URL (starts browser if needed)
  2. Inspect: browser-use state - Returns clickable elements with indices
  3. Interact: Use indices from state to interact (browser-use click 5, browser-use input 3 "text")
  4. Verify: browser-use state or browser-use screenshot to confirm actions
  5. Repeat: Browser stays open between commands

Browser Modes

browser-use --browser chromium open <url>      # Default: headless Chromium
browser-use --browser chromium --headed open <url>  # Visible Chromium window
browser-use --browser real open <url>          # User's Chrome with login sessions
browser-use --browser remote open <url>        # Cloud browser (requires API key)
  • chromium: Fast, isolated, headless by default
  • real: Uses your Chrome with cookies, extensions, logged-in sessions
  • remote: Cloud-hosted browser with proxy support (requires BROWSER_USE_API_KEY)

Commands

Navigation

browser-use open <url>                    # Navigate to URL
browser-use back                          # Go back in history
browser-use scroll down                   # Scroll down
browser-use scroll up                     # Scroll up

Page State

browser-use state                         # Get URL, title, and clickable elements
browser-use screenshot                    # Take screenshot (outputs base64)
browser-use screenshot path.png           # Save screenshot to file
browser-use screenshot --full path.png    # Full page screenshot

Interactions (use indices from browser-use state)

browser-use click <index>                 # Click element
browser-use type "text"                   # Type text into focused element
browser-use input <index> "text"          # Click element, then type text
browser-use keys "Enter"                  # Send keyboard keys
browser-use keys "Control+a"              # Send key combination
browser-use select <index> "option"       # Select dropdown option

Tab Management

browser-use switch <tab>                  # Switch to tab by index
browser-use close-tab                     # Close current tab
browser-use close-tab <tab>               # Close specific tab

JavaScript & Data

browser-use eval "document.title"         # Execute JavaScript, return result
browser-use extract "all product prices"  # Extract data using LLM (requires API key)

Cookies

browser-use cookies get                   # Get all cookies
browser-use cookies get --url <url>       # Get cookies for specific URL
browser-use cookies set <name> <value>    # Set a cookie
browser-use cookies set name val --domain .example.com --secure --http-only
browser-use cookies clear                 # Clear all cookies
browser-use cookies clear --url <url>     # Clear cookies for specific URL
browser-use cookies export <file>         # Export all cookies to JSON file
browser-use cookies export <file> --url <url>  # Export cookies for specific URL
browser-use cookies import <file>         # Import cookies from JSON file

Wait Conditions

browser-use wait selector "h1"            # Wait for element to be visible
browser-use wait selector ".loading" --state hidden  # Wait for element to disappear
browser-use wait selector "#btn" --state attached    # Wait for element in DOM
browser-use wait text "Success"           # Wait for text to appear
browser-use wait selector "h1" --timeout 5000  # Custom timeout in ms

Additional Interactions

browser-use hover <index>                 # Hover over element (triggers CSS :hover)
browser-use dblclick <index>              # Double-click element
browser-use rightclick <index>            # Right-click element (context menu)

Information Retrieval

browser-use get title                     # Get page title
browser-use get html                      # Get full page HTML
browser-use get html --selector "h1"      # Get HTML of specific element
browser-use get text <index>              # Get text content of element
browser-use get value <index>             # Get value of input/textarea
browser-use get attributes <index>        # Get all attributes of element
browser-use get bbox <index>              # Get bounding box (x, y, width, height)

Python Execution (Persistent Session)

browser-use python "x = 42"               # Set variable
browser-use python "print(x)"             # Access variable (outputs: 42)
browser-use python "print(browser.url)"   # Access browser object
browser-use python --vars                 # Show defined variables
browser-use python --reset                # Clear Python namespace
browser-use python --file script.py       # Execute Python file

The Python session maintains state across commands. The browser object provides:

  • browser.url - Current page URL
  • browser.title - Page title
  • browser.goto(url) - Navigate
  • browser.click(index) - Click element
  • browser.type(text) - Type text
  • browser.screenshot(path) - Take screenshot
  • browser.scroll() - Scroll page
  • browser.html - Get page HTML

Agent Tasks (Requires API Key)

browser-use run "Fill the contact form with test data"    # Run AI agent
browser-use run "Extract all product prices" --max-steps 50

Agent tasks use an LLM to autonomously complete complex browser tasks. Requires BROWSER_USE_API_KEY or configured LLM API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc).

Session Management

browser-use sessions                      # List active sessions
browser-use close                         # Close current session
browser-use close --all                   # Close all sessions

Profile Management

browser-use profile list-local            # List local Chrome profiles

Before opening a real browser (--browser real), always ask the user if they want to use a specific Chrome profile or no profile. Use profile list-local to show available profiles:

browser-use profile list-local
# Output: Default: Person 1 ([email protected])
#         Profile 1: Work ([email protected])

# With a specific profile (has that profile's cookies/logins)
browser-use --browser real --profile "Profile 1" open https://gmail.com

# Without a profile (fresh browser, no existing logins)
browser-use --browser real open https://gmail.com

# Headless mode (no visible window) - useful for cookie export
browser-use --browser real --profile "Default" cookies export /tmp/cookies.json

Each Chrome profile has its own cookies, history, and logged-in sessions. Choosing the right profile determines whether sites will be pre-authenticated.

Cloud Profiles

Cloud profiles store browser state (cookies) in Browser-Use Cloud, persisting across sessions. Requires BROWSER_USE_API_KEY.

browser-use profile list                      # List cloud profiles
browser-use profile get <id>                  # Get profile details
browser-use profile update <id> --name "New"  # Rename profile
browser-use profile delete <id>               # Delete profile

Use a cloud profile with --browser remote --profile <id>:

browser-use --browser remote --profile abc-123 open https://example.com

Syncing Cookies to Cloud

⚠️ IMPORTANT: Before syncing cookies from a local browser to the cloud, the agent MUST:

  1. Ask the user which local Chrome profile to use (browser-use profile list-local)
  2. Ask which domain(s) to sync - do NOT default to syncing the full profile
  3. Confirm before proceeding

Default behavior: Create a NEW cloud profile for each domain sync. This ensures clear separation of concerns for cookies. Users can add cookies to existing profiles if needed.

Step 1: List available profiles and cookies

# List local Chrome profiles
browser-use profile list-local
# → Default: Person 1 ([email protected])
# → Profile 1: Work ([email protected])

# See what cookies are in a profile
browser-use profile cookies "Default"
# → youtube.com: 23
# → google.com: 18
# → github.com: 2

Step 2: Sync cookies (three levels of control)

1. Domain-specific sync (recommended default)

browser-use profile sync --from "Default" --domain youtube.com
# Creates new cloud profile: "Chrome - Default (youtube.com)"
# Only syncs youtube.com cookies

This is the recommended approach - sync only the cookies you need.

2. Full profile sync (use with caution)

browser-use profile sync --from "Default"
# Syncs ALL cookies from the profile

⚠️ Warning: This syncs ALL cookies including sensitive data, tracking cookies, session tokens for every site, etc. Only use when the user explicitly needs their entire browser state.

3. Fine-grained control (advanced)

# Export cookies to file
browser-use --browser real --profile "Default" cookies export /tmp/cookies.json

# Manually edit the JSON to keep only specific cookies

# Import to cloud profile
browser-use --browser remote --profile <id> cookies import /tmp/cookies.json

For users who nee


Content truncated.

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,5731,370

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,1161,191

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,4181,109

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,196748

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,154684

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,317614

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.