github-wayback-recovery

3views

1installs

Recover deleted GitHub content using the Wayback Machine and Archive.org APIs. Use when repositories, files, issues, PRs, or wiki pages have been deleted from GitHub but may persist in web archives. Covers CDX API queries, URL patterns, and systematic recovery workflows.

Install

mkdir -p .claude/skills/github-wayback-recovery && curl -L -o skill.zip "https://mcp.directory/api/skills/download/6362" && unzip -o skill.zip -d .claude/skills/github-wayback-recovery && rm skill.zip

Installs to .claude/skills/github-wayback-recovery

About this skill

GitHub Wayback Recovery

Purpose: Recover deleted GitHub content (README files, issues, PRs, wiki pages, repository metadata) from the Internet Archive's Wayback Machine when content is no longer available on GitHub.

When to Use This Skill

Repository has been deleted and you need README, wiki, or metadata
Issues or PRs were deleted by author, maintainer, or moderation
Need to recover file contents that may have been archived
Investigating historical state of a repository
Finding forks of deleted repositories via archived network pages
Recovering release notes or documentation from deleted projects

Complementary Skills:

github-archive: For structured event data (who did what, when) - always check first
github-commit-recovery: For accessing commits when you have SHAs
github-wayback-recovery (this skill): For web page snapshots when content is fully deleted

Core Principles

Wayback Machine Archives Web Pages, Not Git Repositories:

Cannot git clone from archived content
Cannot reconstruct full commit history
Recovery success depends on whether specific URLs were crawled

What CAN Be Recovered:

README files and repository descriptions
Issue titles, bodies, and comments (Archive Team prioritizes these)
PR conversations and descriptions (Files Changed tab often fails)
Wiki pages (especially wiki home)
Release notes and descriptions
Repository metadata (stars, language, license visible on homepage)
Commit SHAs from archived commit list pages (use with github-commit-recovery skill to access actual content)

What CANNOT Be Recovered:

Private repository content (never crawled)
Complete git history or repository clone
Content behind authentication

Quick Start

Check if a repository page was archived:

curl -s "https://archive.org/wayback/available?url=github.com/owner/repo" | jq

Search for all archived URLs under a repository:

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/*&output=json&collapse=urlkey" | head -50

Access an archived snapshot:

https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo

GitHub URL Patterns for Archive Searches

Understanding GitHub's URL structure is essential for constructing archive queries.

Repository-Level URLs

Content Type	URL Pattern
Homepage	`github.com/{owner}/{repo}`
Commits list	`github.com/{owner}/{repo}/commits/{branch}`
Individual commit	`github.com/{owner}/{repo}/commit/{full-sha}`
Fork network	`github.com/{owner}/{repo}/network/members`

File and Directory URLs

Content Type	URL Pattern
File view	`github.com/{owner}/{repo}/blob/{branch}/{path/to/file}`
Directory view	`github.com/{owner}/{repo}/tree/{branch}/{directory}`
File history	`github.com/{owner}/{repo}/commits/{branch}/{path/to/file}`
Raw file	`raw.githubusercontent.com/{owner}/{repo}/{branch}/{path}`

Note: blob = files, tree = directories. Raw URLs are rarely archived compared to rendered views.

Collaboration Artifacts

Content Type	URL Pattern
Pull request	`github.com/{owner}/{repo}/pull/{number}`
PR files	`github.com/{owner}/{repo}/pull/{number}/files`
PR commits	`github.com/{owner}/{repo}/pull/{number}/commits`
Issue	`github.com/{owner}/{repo}/issues/{number}`
Wiki page	`github.com/{owner}/{repo}/wiki/{page-name}`
Release	`github.com/{owner}/{repo}/releases/tag/{tag-name}`
All PRs	`github.com/{owner}/{repo}/pulls?state=all`
All issues	`github.com/{owner}/{repo}/issues?state=all`

CDX API Reference

The Capture Index (CDX) API provides structured search across all archived URLs.

Basic Query Structure

https://web.archive.org/cdx/search/cdx?url={URL}&output=json

Essential Parameters

Parameter	Effect	Example
`matchType=exact`	Exact URL only (default)	Single page
`matchType=prefix`	All URLs starting with path	All repo content
`url=.../*`	Wildcard (same as prefix)	`github.com/owner/repo/*`
`from=YYYY`	Start date filter	`from=2023`
`to=YYYY`	End date filter	`to=2024`
`filter=statuscode:200`	Only successful captures	Skip redirects/errors
`collapse=timestamp:8`	One capture per day	Reduce duplicates
`collapse=urlkey`	Unique URLs only	List all archived pages
`limit=N`	Limit results	`limit=100`
`output=json`	JSON format	Machine-readable

Query Examples

Find all archived pages under a repository:

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/facebook/react/*&matchType=prefix&output=json&collapse=urlkey"

Find archived issues for a specific repository:

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/issues/*&output=json&collapse=urlkey&filter=statuscode:200"

Find archived snapshots of a specific file:

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/blob/*/path/to/file&output=json"

Check for archived snapshots near a specific date:

curl -s "https://archive.org/wayback/available?url=github.com/owner/repo&timestamp=20230615"

CDX Response Format

[
  ["urlkey", "timestamp", "original", "mimetype", "statuscode", "digest", "length"],
  ["com,github)/owner/repo", "20230615142311", "https://github.com/owner/repo", "text/html", "200", "ABC123...", "12345"]
]

Investigation Patterns

Recovering Deleted File Contents

Scenario: Repository or file has been deleted, need to recover file contents.

Step 1: Search for blob URLs

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/blob/*/README.md&output=json"

Step 2: Construct archive URL from timestamp

https://web.archive.org/web/20230615142311/https://github.com/owner/repo/blob/main/README.md

Step 3: Extract content manually or use waybackpack

pip install waybackpack
waybackpack "https://github.com/owner/repo/blob/main/README.md" -d output_dir

Forensic Value: Recover documentation, configuration files, or evidence that existed at specific points in time.

Recovering Deleted Issue/PR Content

Scenario: Issue or PR was deleted and you need the original content.

Step 1: Query for issue page snapshots

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/issues/123*&output=json"

Step 2: Access archived page

https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo/issues/123

Step 3: If issue number unknown, search PR/issue listing

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/issues?state=all&output=json"

Note: Archive Team actively crawls GitHub issues and PRs since 2020. Issue content has higher recovery success than file contents.

Finding Forks of Deleted Repositories

Scenario: Repository is deleted, but forks may contain the full git history.

Step 1: Search for archived fork network page

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/network/members&output=json"

Step 2: Access archived network page

https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo/network/members

Step 3: Extract fork usernames from archived page, check if forks still exist

# Check if fork exists
curl -s -o /dev/null -w "%{http_code}" https://github.com/forker/repo

Forensic Value: Active forks contain complete git history including all commits. This often yields better results than trying to recover individual files.

Recovering Wiki Content

Scenario: Repository wiki has been deleted or made private.

Step 1: Search for wiki pages

curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/wiki*&output=json&collapse=urlkey"

Step 2: Access wiki home or specific pages

https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo/wiki
https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo/wiki/Page-Name

Python Implementation

import requests
import json
from typing import Optional, List, Dict
from time import sleep

class WaybackGitHubRecovery:
    CDX_API = "https://web.archive.org/cdx/search/cdx"
    AVAILABILITY_API = "https://archive.org/wayback/available"
    ARCHIVE_URL = "https://web.archive.org/web"

    def check_availability(self, url: str, timestamp: Optional[str] = None) -> Optional[Dict]:
        """Check if URL has any archived snapshots."""
        params = {"url": url}
        if timestamp:
            params["timestamp"] = timestamp

        resp = requests.get(self.AVAILABILITY_API, params=params)
        data = resp.json()

        if data.get("archived_snapshots", {}).get("closest"):
            return data["archived_snapshots"]["closest"]
        return None

    def search_cdx(self, url: str, match_type: str = "prefix",
                   collapse: str = "urlkey", limit: int = 1000) -> List[Dict]:
        """Search CDX API for archived URLs."""
        params = {
            "url": url,
            "output": "json",
            "matchType": match_type,
            "collapse": collapse,
            "filter": "statuscode:200",
            "limit": limit
        }

        resp = requests.get(self.CDX_API, params=params)
        data = resp.json()

        if len(data) <= 1:  # Only header row
            return []

        headers = data[0]
        results = []
        for row in data[1:]:
            results.append(dict(zip(headers, row)))

        return results

    def find_repository_content(self, owner: str, repo: str) -> Dict[str, List]:
        """Find all archived content for a repository."""
        bas

---

*Content truncated.*

More by gadievron

View all skills by gadievron →

code-coverage-with-gcov

gadievron

Add gcov code coverage instrumentation to C/C++ projects

9713

function-call-tracing

gadievron

Instrument C/C++ with -finstrument-functions for execution tracing and Perfetto visualization

line-execution-checker

gadievron

Check if specific lines were executed using gcov data

github-commit-recovery

gadievron

Recover deleted commits from GitHub using REST API, web interface, and git fetch. Use when you have commit SHAs and need to retrieve actual commit content, diffs, or patches. Includes techniques for accessing "deleted" commits that remain on GitHub servers.

github-archive

gadievron

Investigate GitHub security incidents using tamper-proof GitHub Archive data via BigQuery. Use when verifying repository activity claims, recovering deleted PRs/branches/tags/repos, attributing actions to actors, or reconstructing attack timelines. Provides immutable forensic evidence of all public GitHub events since 2011.

rr-debugger

gadievron

Deterministic debugging with rr record-replay. Use when debugging crashes, ASAN faults, or when reverse execution is needed. Provides reverse-next, reverse-step, reverse-continue commands and crash trace extraction.

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6851,428

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2641,326

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5341,147

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,355809

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,264727

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,486684

Related MCP Servers

Browse all servers

Exa Search

Empower AI with the Exa MCP Server—an AI research tool for real-time web search, academic data, and smarter, up-to-date

3,9550 tools

Learning Hour Generator

Learning Hour Generator creates 60-minute technical practice sessions for dev teams using GitHub analysis and the 4C Lea

60 tools

Firecrawl

Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea

89,5930 tools

Repomix

Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool

22,2988 tools

Blender

Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactiv

17,59521 tools

Chrome MCP

Chrome extension-based MCP server that exposes browser functionality to AI assistants. Control tabs, capture screenshots

10,6750 tools

Install

mkdir -p .claude/skills/github-wayback-recovery && curl -L -o skill.zip "https://mcp.directory/api/skills/download/6362" && unzip -o skill.zip -d .claude/skills/github-wayback-recovery && rm skill.zip

Installs to .claude/skills/github-wayback-recovery

Stats

Views

Installs

Author

gadievron

7 skills published

Links

Source Code

github-wayback-recovery

Install

About this skill

GitHub Wayback Recovery

When to Use This Skill

Core Principles

Quick Start

GitHub URL Patterns for Archive Searches

Repository-Level URLs

File and Directory URLs

Collaboration Artifacts

CDX API Reference

Basic Query Structure

Essential Parameters

Query Examples

CDX Response Format

Investigation Patterns

Recovering Deleted File Contents

Recovering Deleted Issue/PR Content

Finding Forks of Deleted Repositories

Recovering Wiki Content

Python Implementation

More by gadievron

code-coverage-with-gcov

function-call-tracing

line-execution-checker

github-commit-recovery

github-archive

rr-debugger

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers