agent-browser-skill

Name: agent-browser-skill
Author: MGdaasLab

by MGdaasLab

41views

4installs

Source

基于 agent-browser CLI 的浏览器自动化工具。提供快照获取、元素交互、截图等功能。推荐用于需要页面快照分析、通过 ref 引用交互元素的场景。

Install

mkdir -p .claude/skills/agent-browser-skill && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3575" && unzip -o skill.zip -d .claude/skills/agent-browser-skill && rm skill.zip

Installs to .claude/skills/agent-browser-skill

About this skill

Agent Browser 浏览器自动化

基于 Vercel agent-browser CLI 的浏览器自动化工具，专为 AI Agent 设计。

⚠️ 核心工作流：Snapshot + Ref

禁止猜测选择器！必须先获取 Snapshot，再通过 Ref 操作元素！

# 1. 打开页面
agent-browser open http://example.com

# 2. 获取交互元素快照
agent-browser snapshot -i
# 输出示例：
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]
# - link "Learn more" [ref=e4]

# 3. 使用 ref 进行操作
agent-browser click @e2
agent-browser fill @e3 "[email protected]"
agent-browser get text @e1

# 4. 页面变化后重新获取快照
agent-browser snapshot -i

使用方法

通过 shell 直接调用 agent-browser 命令。所有命令都是独立的，会自动连接到后台守护进程管理的浏览器实例。

安装要求

# 全局安装
npm install -g agent-browser

# 下载 Chromium
agent-browser install

核心命令

快照（页面分析）

agent-browser snapshot            # 完整可访问性树
agent-browser snapshot -i         # 仅交互元素（推荐）
agent-browser snapshot -c         # 紧凑输出
agent-browser snapshot -d 3       # 限制深度为3层
agent-browser snapshot -s "#main" # 范围限定到 CSS 选择器
agent-browser snapshot --json     # JSON 输出（适合程序处理）

交互操作（使用 @ref）

agent-browser click @e1           # 点击
agent-browser dblclick @e1        # 双击
agent-browser focus @e1           # 聚焦元素
agent-browser fill @e2 "text"     # 清空并输入
agent-browser type @e2 "text"     # 追加输入（不清空）
agent-browser press Enter         # 按键
agent-browser press Control+a     # 组合键
agent-browser hover @e1           # 悬停
agent-browser check @e1           # 勾选复选框
agent-browser uncheck @e1         # 取消勾选
agent-browser select @e1 "value"  # 选择下拉选项
agent-browser scroll down 500     # 向下滚动 500px
agent-browser scrollintoview @e1  # 滚动到元素可见
agent-browser drag @e1 @e2        # 拖拽
agent-browser upload @e1 file.pdf # 上传文件

获取信息

agent-browser get text @e1        # 获取元素文本
agent-browser get html @e1        # 获取 innerHTML
agent-browser get value @e1       # 获取输入框值
agent-browser get attr @e1 href   # 获取属性
agent-browser get title           # 获取页面标题
agent-browser get url             # 获取当前 URL
agent-browser get count ".item"   # 统计匹配元素数量
agent-browser get box @e1         # 获取元素边界框

状态检查

agent-browser is visible @e1      # 检查是否可见
agent-browser is enabled @e1      # 检查是否可用
agent-browser is checked @e1      # 检查是否勾选

截图 & PDF

agent-browser screenshot              # 截图到标准输出（base64）
agent-browser screenshot ./page.png   # 保存到文件
agent-browser screenshot --full       # 全页面截图
agent-browser pdf output.pdf          # 保存为 PDF

等待

agent-browser wait @e1            # 等待元素可见
agent-browser wait 2000           # 等待 2000 毫秒
agent-browser wait --text "成功"   # 等待文本出现
agent-browser wait --url "**/dashboard"  # 等待 URL 匹配
agent-browser wait --load networkidle    # 等待网络空闲

CSS 选择器（也支持）

agent-browser click "#submit"
agent-browser fill "#email" "[email protected]"
agent-browser find role button click --name "Submit"

会话管理

多个 AI Agent 可使用不同的浏览器实例：

# 不同会话
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com

# 或通过环境变量
AGENT_BROWSER_SESSION=agent1 agent-browser click @e1

# 列出活跃会话
agent-browser session list

截图路径约定

建议统一保存到 SCREENSHOT_DIR 环境变量指定的目录：

SCREENSHOT_DIR=$(pwd)/media/screenshots
agent-browser screenshot ${SCREENSHOT_DIR}/case_11_step1.png

典型使用场景

登录测试

# 打开登录页
agent-browser open http://192.168.150.114:8913/login

# 获取页面元素
agent-browser snapshot -i
# 输出：
# - textbox "请输入用户名" [ref=e1]
# - textbox "请输入密码" [ref=e2]
# - button "登录" [ref=e3]

# 填写表单
agent-browser fill @e1 "admin"
agent-browser fill @e2 "admin123456"

# 截图
agent-browser screenshot ./step1_filled.png

# 点击登录
agent-browser click @e3

# 等待跳转
agent-browser wait --url "**/dashboard"
agent-browser snapshot -i

# 截图结果
agent-browser screenshot ./step2_result.png

# 关闭浏览器
agent-browser close

与 playwright-skill 对比

特性	agent-browser-skill	playwright-skill
调用方式	独立 CLI 命令	node run.js "code"
元素定位	Snapshot + @ref	CSS 选择器
状态保持	自动守护进程	每次启动新浏览器
AI 友好度	高（专为 AI 设计）	中
代码复杂度	简单命令	需写 JS 代码

More by MGdaasLab

View all skills by MGdaasLab →

whart-test

MGdaasLab

WHartTest测试管理平台工具集。用于管理项目、模块、测试用例的增删改查，以及截图上传和drawio图表操作。当用户需要操作测试用例、查询项目信息、上传截图或创建编辑图表时使用。

185

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,8862,530

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,8151,657

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1521,641

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,2681,469

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,4701,225

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,959969

Install

mkdir -p .claude/skills/agent-browser-skill && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3575" && unzip -o skill.zip -d .claude/skills/agent-browser-skill && rm skill.zip

Installs to .claude/skills/agent-browser-skill

Stats

Views

Installs

Author

MGdaasLab

2 skills published

Links

Source Code

agent-browser-skill

Install

About this skill

Agent Browser 浏览器自动化

⚠️ 核心工作流：Snapshot + Ref

使用方法

安装要求

核心命令

导航

快照（页面分析）

交互操作（使用 @ref）

获取信息

状态检查

截图 & PDF

等待

CSS 选择器（也支持）

会话管理

截图路径约定

典型使用场景

登录测试

与 playwright-skill 对比

More by MGdaasLab

whart-test

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro