Scrapling Fetch

Scrapling Fetch

cyberchitta

Fetches web page content that's normally blocked by bot detection systems, allowing AI to access protected websites that would otherwise be inaccessible.

68645 views13Local (stdio)

What it does

  • Fetch complete web pages bypassing bot detection
  • Extract specific content patterns with regex
  • Handle pagination automatically
  • Use three protection levels (basic, stealth, max-stealth)
  • Retrieve text and HTML content only

Best for

Accessing documentation on protected sitesRetrieving reference materials from bot-protected websitesLow-volume content retrieval for researchAI assistants needing access to blocked web content
Bypasses bot detection mechanismsThree stealth protection levelsOptimized for documentation retrieval

Tools (2)

s_fetch_page

Fetches a complete web page with pagination support. Retrieves content from websites with bot-detection avoidance. For best performance, start with 'basic' mode (fastest), then only escalate to 'stealth' or 'max-stealth' modes if basic mode fails. Content is returned as 'METADATA: {json}\n\n[content]' where metadata includes length information and truncation status. Args: url: URL to fetch mode: Fetching mode (basic, stealth, or max-stealth) format: Output format (html or markdown) max_length: Maximum number of characters to return. start_index: On return output starting at this character index, useful if a previous fetch was truncated and more content is required.

s_fetch_pattern

Extracts content matching regex patterns from web pages. Retrieves specific content from websites with bot-detection avoidance. For best performance, start with 'basic' mode (fastest), then only escalate to 'stealth' or 'max-stealth' modes if basic mode fails. Returns matched content as 'METADATA: {json}\n\n[content]' where metadata includes match statistics and truncation information. Each matched content chunk is delimited with '॥๛॥' and prefixed with '[Position: start-end]' indicating its byte position in the original document, allowing targeted follow-up requests with s-fetch-page using specific start_index values. Args: url: URL to fetch search_pattern: Regular expression pattern to search for in the content mode: Fetching mode (basic, stealth, or max-stealth) format: Output format (html or markdown) max_length: Maximum number of characters to return. context_chars: Number of characters to include before and after each match

Alternatives