github-wayback-recovery
Recover deleted GitHub content using the Wayback Machine and Archive.org APIs. Use when repositories, files, issues, PRs, or wiki pages have been deleted from GitHub but may persist in web archives. Covers CDX API queries, URL patterns, and systematic recovery workflows.
Install
mkdir -p .claude/skills/github-wayback-recovery && curl -L -o skill.zip "https://mcp.directory/api/skills/download/6362" && unzip -o skill.zip -d .claude/skills/github-wayback-recovery && rm skill.zipInstalls to .claude/skills/github-wayback-recovery
About this skill
GitHub Wayback Recovery
Purpose: Recover deleted GitHub content (README files, issues, PRs, wiki pages, repository metadata) from the Internet Archive's Wayback Machine when content is no longer available on GitHub.
When to Use This Skill
- Repository has been deleted and you need README, wiki, or metadata
- Issues or PRs were deleted by author, maintainer, or moderation
- Need to recover file contents that may have been archived
- Investigating historical state of a repository
- Finding forks of deleted repositories via archived network pages
- Recovering release notes or documentation from deleted projects
Complementary Skills:
- github-archive: For structured event data (who did what, when) - always check first
- github-commit-recovery: For accessing commits when you have SHAs
- github-wayback-recovery (this skill): For web page snapshots when content is fully deleted
Core Principles
Wayback Machine Archives Web Pages, Not Git Repositories:
- Cannot
git clonefrom archived content - Cannot reconstruct full commit history
- Recovery success depends on whether specific URLs were crawled
What CAN Be Recovered:
- README files and repository descriptions
- Issue titles, bodies, and comments (Archive Team prioritizes these)
- PR conversations and descriptions (Files Changed tab often fails)
- Wiki pages (especially wiki home)
- Release notes and descriptions
- Repository metadata (stars, language, license visible on homepage)
- Commit SHAs from archived commit list pages (use with github-commit-recovery skill to access actual content)
What CANNOT Be Recovered:
- Private repository content (never crawled)
- Complete git history or repository clone
- Content behind authentication
Quick Start
Check if a repository page was archived:
curl -s "https://archive.org/wayback/available?url=github.com/owner/repo" | jq
Search for all archived URLs under a repository:
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/*&output=json&collapse=urlkey" | head -50
Access an archived snapshot:
https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo
GitHub URL Patterns for Archive Searches
Understanding GitHub's URL structure is essential for constructing archive queries.
Repository-Level URLs
| Content Type | URL Pattern |
|---|---|
| Homepage | github.com/{owner}/{repo} |
| Commits list | github.com/{owner}/{repo}/commits/{branch} |
| Individual commit | github.com/{owner}/{repo}/commit/{full-sha} |
| Fork network | github.com/{owner}/{repo}/network/members |
File and Directory URLs
| Content Type | URL Pattern |
|---|---|
| File view | github.com/{owner}/{repo}/blob/{branch}/{path/to/file} |
| Directory view | github.com/{owner}/{repo}/tree/{branch}/{directory} |
| File history | github.com/{owner}/{repo}/commits/{branch}/{path/to/file} |
| Raw file | raw.githubusercontent.com/{owner}/{repo}/{branch}/{path} |
Note: blob = files, tree = directories. Raw URLs are rarely archived compared to rendered views.
Collaboration Artifacts
| Content Type | URL Pattern |
|---|---|
| Pull request | github.com/{owner}/{repo}/pull/{number} |
| PR files | github.com/{owner}/{repo}/pull/{number}/files |
| PR commits | github.com/{owner}/{repo}/pull/{number}/commits |
| Issue | github.com/{owner}/{repo}/issues/{number} |
| Wiki page | github.com/{owner}/{repo}/wiki/{page-name} |
| Release | github.com/{owner}/{repo}/releases/tag/{tag-name} |
| All PRs | github.com/{owner}/{repo}/pulls?state=all |
| All issues | github.com/{owner}/{repo}/issues?state=all |
CDX API Reference
The Capture Index (CDX) API provides structured search across all archived URLs.
Basic Query Structure
https://web.archive.org/cdx/search/cdx?url={URL}&output=json
Essential Parameters
| Parameter | Effect | Example |
|---|---|---|
matchType=exact | Exact URL only (default) | Single page |
matchType=prefix | All URLs starting with path | All repo content |
url=.../* | Wildcard (same as prefix) | github.com/owner/repo/* |
from=YYYY | Start date filter | from=2023 |
to=YYYY | End date filter | to=2024 |
filter=statuscode:200 | Only successful captures | Skip redirects/errors |
collapse=timestamp:8 | One capture per day | Reduce duplicates |
collapse=urlkey | Unique URLs only | List all archived pages |
limit=N | Limit results | limit=100 |
output=json | JSON format | Machine-readable |
Query Examples
Find all archived pages under a repository:
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/facebook/react/*&matchType=prefix&output=json&collapse=urlkey"
Find archived issues for a specific repository:
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/issues/*&output=json&collapse=urlkey&filter=statuscode:200"
Find archived snapshots of a specific file:
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/blob/*/path/to/file&output=json"
Check for archived snapshots near a specific date:
curl -s "https://archive.org/wayback/available?url=github.com/owner/repo×tamp=20230615"
CDX Response Format
[
["urlkey", "timestamp", "original", "mimetype", "statuscode", "digest", "length"],
["com,github)/owner/repo", "20230615142311", "https://github.com/owner/repo", "text/html", "200", "ABC123...", "12345"]
]
Investigation Patterns
Recovering Deleted File Contents
Scenario: Repository or file has been deleted, need to recover file contents.
Step 1: Search for blob URLs
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/blob/*/README.md&output=json"
Step 2: Construct archive URL from timestamp
https://web.archive.org/web/20230615142311/https://github.com/owner/repo/blob/main/README.md
Step 3: Extract content manually or use waybackpack
pip install waybackpack
waybackpack "https://github.com/owner/repo/blob/main/README.md" -d output_dir
Forensic Value: Recover documentation, configuration files, or evidence that existed at specific points in time.
Recovering Deleted Issue/PR Content
Scenario: Issue or PR was deleted and you need the original content.
Step 1: Query for issue page snapshots
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/issues/123*&output=json"
Step 2: Access archived page
https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo/issues/123
Step 3: If issue number unknown, search PR/issue listing
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/issues?state=all&output=json"
Note: Archive Team actively crawls GitHub issues and PRs since 2020. Issue content has higher recovery success than file contents.
Finding Forks of Deleted Repositories
Scenario: Repository is deleted, but forks may contain the full git history.
Step 1: Search for archived fork network page
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/network/members&output=json"
Step 2: Access archived network page
https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo/network/members
Step 3: Extract fork usernames from archived page, check if forks still exist
# Check if fork exists
curl -s -o /dev/null -w "%{http_code}" https://github.com/forker/repo
Forensic Value: Active forks contain complete git history including all commits. This often yields better results than trying to recover individual files.
Recovering Wiki Content
Scenario: Repository wiki has been deleted or made private.
Step 1: Search for wiki pages
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/owner/repo/wiki*&output=json&collapse=urlkey"
Step 2: Access wiki home or specific pages
https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo/wiki
https://web.archive.org/web/{TIMESTAMP}/https://github.com/owner/repo/wiki/Page-Name
Python Implementation
import requests
import json
from typing import Optional, List, Dict
from time import sleep
class WaybackGitHubRecovery:
CDX_API = "https://web.archive.org/cdx/search/cdx"
AVAILABILITY_API = "https://archive.org/wayback/available"
ARCHIVE_URL = "https://web.archive.org/web"
def check_availability(self, url: str, timestamp: Optional[str] = None) -> Optional[Dict]:
"""Check if URL has any archived snapshots."""
params = {"url": url}
if timestamp:
params["timestamp"] = timestamp
resp = requests.get(self.AVAILABILITY_API, params=params)
data = resp.json()
if data.get("archived_snapshots", {}).get("closest"):
return data["archived_snapshots"]["closest"]
return None
def search_cdx(self, url: str, match_type: str = "prefix",
collapse: str = "urlkey", limit: int = 1000) -> List[Dict]:
"""Search CDX API for archived URLs."""
params = {
"url": url,
"output": "json",
"matchType": match_type,
"collapse": collapse,
"filter": "statuscode:200",
"limit": limit
}
resp = requests.get(self.CDX_API, params=params)
data = resp.json()
if len(data) <= 1: # Only header row
return []
headers = data[0]
results = []
for row in data[1:]:
results.append(dict(zip(headers, row)))
return results
def find_repository_content(self, owner: str, repo: str) -> Dict[str, List]:
"""Find all archived content for a repository."""
bas
---
*Content truncated.*
More by gadievron
View all skills by gadievron →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversEmpower AI with the Exa MCP Server—an AI research tool for real-time web search, academic data, and smarter, up-to-date
Learning Hour Generator creates 60-minute technical practice sessions for dev teams using GitHub analysis and the 4C Lea
Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea
Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool
Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactiv
Chrome extension-based MCP server that exposes browser functionality to AI assistants. Control tabs, capture screenshots
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.