Web Fetcher

Name: Web Fetcher
Rating: 4.5 (707 reviews)
Author: jae-jae

Fetches web page content using Playwright's headless browser, extracting clean readable text from JavaScript-heavy websites. Outputs content in HTML or Markdown format for research and data gathering.

Fetches and extracts web content using Playwright's headless browser capabilities, delivering clean, readable content from JavaScript-heavy websites in HTML or Markdown format for research and information gathering.

1,0021,571 views92RemoteLocal (stdio)

browser automation search web

GitHub

What it does

Fetch content from JavaScript-rendered websites
Extract main content while removing ads and navigation
Process multiple URLs in parallel
Output content in HTML or Markdown format
Handle dynamic web applications and SPAs

Best for

Web scraping and content extractionResearch and information gatheringContent analysis from modern web appsBatch processing of multiple websites

JavaScript execution supportIntelligent content extraction with ReadabilityParallel URL processing

About Web Fetcher

Web Fetcher is a community-built MCP server published by jae-jae that provides AI assistants with tools and capabilities via the Model Context Protocol. Web Fetcher uses Playwright for reliable data web scraping and extraction from JavaScript-heavy websites, returning clea It is categorized under browser automation, search web. This server exposes 3 tools that AI clients can invoke during conversations and coding sessions.

How to install

You can install Web Fetcher in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport. This server supports remote connections over HTTP, so no local installation is required.

License

Web Fetcher is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Tools (3)

fetch_url

Retrieve web page content from a specified URL

fetch_urls

Retrieve web page content from multiple specified URLs

browser_install

Install Playwright Chromium browser binary. Call this if you get an error about the browser not being installed.

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

🌟 Recommended: OllaMan - Powerful Ollama AI Model Manager.

Advantages

JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.
Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser by running the following command in your terminal:

npx playwright install chromium

HTTP and SSE Transport

Use the --transport=http parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously:

npx -y fetcher-mcp --log --transport=http --host=0.0.0.0 --port=3000

After startup, the server provides the following endpoints:

/mcp - Streamable HTTP endpoint (modern MCP protocol)
/sse - SSE endpoint (legacy MCP protocol)

Clients can choose which method to connect based on their needs.

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Docker Deployment

Running with Docker

docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest

Deploying with Docker Compose

Create a docker-compose.yml file:

version: "3.8"

services:
  fetcher-mcp:
    image: ghcr.io/jae-jae/fetcher-mcp:latest
    container_name: fetcher-mcp
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
    # Using host network mode on Linux hosts can improve browser access efficiency
    # network_mode: "host"
    volumes:
      # For Playwright, may need to share certain system paths
      - /tmp:/tmp
    # Health check
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000"]
      interval: 30s
      timeout: 10s
      retries: 3

Then run:

docker-compose up -d

Features

fetch_url - Retrieve web page content from a specified URL
- Uses Playwright headless browser to parse JavaScript
- Supports intelligent extraction of main content and conversion to Markdown
- Supports the following parameters:
  - url: The URL of the web page to fetch (required parameter)
  - timeout: Page loading timeout in milliseconds, default is 30000 (30 seconds)
  - waitUntil: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'
  - extractContent: Whether to intelligently extract the main content, default is true
  - maxLength: Maximum length of returned content (in characters), default is no limit
  - returnHtml: Whether to return HTML content instead of Markdown, default is false
  - waitForNavigation: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
  - navigationTimeout: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
  - disableMedia: Whether to disable media resources (images, stylesheets, fonts, media), default is true
  - debug: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
fetch_urls - Batch retrieve web page content from multiple URLs in parallel
- Uses multi-tab parallel fetching for improved performance
- Returns combined results with clear separation between webpages
- Supports the following parameters:
  - urls: Array of URLs to fetch (required parameter)
  - Other parameters are the same as fetch_url
browser_install - Install Playwright Chromium browser binary automatically
- Installs required Chromium browser binary when not available
- Automatically suggested when browser installation errors occur
- Supports the following parameters:
  - withDeps: Install system dependencies required by Chromium browser, default is false
  - force: Force installation even if Chromium is already installed, default is false

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:
```
Please wait for the page to fully load
```
This will use the waitForNavigation: true parameter.
Increase Timeout Duration: For websites that load slowly:
```
Please set the page loading timeout to 60 seconds
```
This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

Preserve Original HTML Structure: When content extraction might fail:
```
Please preserve the original HTML content
```
Sets extractContent: false and returnHtml: true.
Fetch Complete Page Content: When extracted content is too limited:
```
Please fetch the complete webpage content instead of just the main content
```
Sets extractContent: false.
Return Content as HTML: When HTML format is needed instead of default Markdown:
```
Please return the content in HTML format
```
Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

Dynamic Debug Activation: To display the browser window during a specific fetch operation:
```
Please enable debug mode for this fetch operation
```
This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

Manual Login: To login using your own credentials:
```
Please run in debug mode so I can manually log in to the website
```
Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.
Interacting with Debug Browser: When debug mode is enabled:
1. The browser window remains open
2. You can manually log into the website using your credentials
3. After login is complete, content will be fetched with your authenticated session
Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:
```
Please enable debug mode for this authentication step
```
Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

npm install

Install Playwright Browser

Install the browsers needed for Playwright:

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

You can also enable visible browser mode for debugging:

node build/index.js --debug

Related Projects

g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License

Alternatives

Firecrawl

mendableai

89.6k

Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea

OfficialPopular

3.0k125

Browser Use

browser-use

79.9k

Browser Use lets LLMs and agents access and scrape any website in real time, making web scraping and web page scraping e

OfficialPopular

36616

Playwright Browser Automation

microsoft

28.4k

Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium s

OfficialPopular

7.6k545

Chrome DevTools MCP

chromedevtools

28.1k

AI-driven control of live Chrome via Chrome DevTools: browser automation, debugging, performance analysis and network mo

OfficialPopular

50711

Related Skills

Browse all skills

browser-automation

Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications. Triggers include "browse", "navigate to", "go to website", "extract data from webpage", "screenshot", "web scraping", "fill out form", "click on", "search for on the web". When taking actions be as specific as possible.

aluvia-web-unblock

Unblock websites and bypass CAPTCHAs and 403 errors using Aluvia mobile proxies. Enables web search and content extraction without browser automation.

aluvia-web-proxy

Unblock websites and bypass CAPTCHAs and 403 errors using Aluvia mobile proxies. Enables web search and content extraction without browser automation.

dev-browser

Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.

chrome-devtools

Browser automation, debugging, and performance analysis using Puppeteer CLI scripts. Use for automating browsers, taking screenshots, analyzing performance, monitoring network traffic, web scraping, form automation, and JavaScript debugging.

qa-tester

"Browser automation QA testing skill. Systematically tests web applications for functionality, security, and usability issues. Reports findings by severity (CRITICAL/HIGH/MEDIUM/LOW) with immediate alerts for critical failures."