replit-incident-runbook

0views

0installs

Execute Replit incident response procedures with triage, mitigation, and postmortem. Use when responding to Replit-related outages, investigating errors, or running post-incident reviews for Replit integration failures. Trigger with phrases like "replit incident", "replit outage", "replit down", "replit on-call", "replit emergency", "replit broken".

Install

mkdir -p .claude/skills/replit-incident-runbook && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8585" && unzip -o skill.zip -d .claude/skills/replit-incident-runbook && rm skill.zip

Installs to .claude/skills/replit-incident-runbook

About this skill

Replit Incident Runbook

Overview

Rapid incident response for Replit deployment failures, database issues, and platform outages. Covers triage, diagnosis, remediation, rollback, and communication.

Prerequisites

Access to Replit Workspace and Deployment settings
Deployment URL for health checks
Communication channel (Slack, email)
Rollback awareness (Deployment History)

Severity Levels

Level	Definition	Response Time	Examples
P1	Complete outage	< 15 min	App returns 5xx, DB down
P2	Degraded service	< 1 hour	Slow responses, intermittent errors
P3	Minor impact	< 4 hours	Non-critical feature broken
P4	No user impact	Next business day	Monitoring gap

Quick Triage (First 5 Minutes)

set -euo pipefail
DEPLOY_URL="https://your-app.replit.app"

echo "=== TRIAGE ==="

# 1. Check Replit platform status
echo -n "Replit Status: "
curl -s https://status.replit.com/api/v2/summary.json | \
  python3 -c "import sys,json;print(json.load(sys.stdin)['status']['description'])" 2>/dev/null || \
  echo "Check https://status.replit.com"

# 2. Check your deployment health
echo -n "App Health: "
curl -s -o /dev/null -w "HTTP %{http_code} (%{time_total}s)" "$DEPLOY_URL/health" 2>/dev/null || echo "UNREACHABLE"
echo ""

# 3. Get health details
echo "Health Response:"
curl -s "$DEPLOY_URL/health" 2>/dev/null | python3 -m json.tool 2>/dev/null || echo "No response"

# 4. Check if it's a cold start issue (Autoscale)
echo -n "Second request: "
curl -s -o /dev/null -w "HTTP %{http_code} (%{time_total}s)\n" "$DEPLOY_URL/health"

Decision Tree

App not responding?
├─ YES: Is status.replit.com reporting an incident?
│   ├─ YES → Platform issue. Wait for Replit. Communicate to users.
│   └─ NO → Your deployment issue. Continue below.
│
│   Can you access the Replit Workspace?
│   ├─ YES → Check deployment logs:
│   │   ├─ Build error → Fix code, redeploy
│   │   ├─ Runtime crash → Check logs, fix, redeploy
│   │   └─ Secret missing → Add to Secrets tab, redeploy
│   └─ NO → Network/browser issue. Try incognito window.
│
└─ App responds but with errors?
    ├─ 5xx errors → Check logs for crash/exception
    ├─ Slow responses → Check database, cold start, memory
    └─ Auth not working → Verify deployment domain, not dev URL

Remediation by Error Type

Deployment Crash (5xx / App Unreachable)

1. Open Replit Workspace
2. Go to Deployment Settings > Logs
3. Look for the crash reason:
   - "Error: Cannot find module..." → Missing dependency
   - "FATAL: Missing secrets..." → Add to Secrets tab
   - "EADDRINUSE" → Port conflict in .replit config
   - "JavaScript heap out of memory" → Increase VM size or fix memory leak

4. Fix the issue in code
5. Click "Deploy" to redeploy
6. If fix is unclear, ROLLBACK:
   - Deployment Settings > History
   - Click "Rollback" on last known-good version

Database Connection Failure

1. Check database status in Database pane
2. Verify DATABASE_URL is set in Secrets
3. Test connection:

# From Replit Shell
node -e "
const {Pool} = require('pg');
const pool = new Pool({connectionString: process.env.DATABASE_URL, ssl:{rejectUnauthorized:false}});
pool.query('SELECT NOW()').then(r => console.log('OK:', r.rows[0])).catch(e => console.error('FAIL:', e.message)).finally(() => pool.end());
"

4. If connection fails:
   - Check if PostgreSQL is provisioned (Database pane)
   - Try creating a new database
   - Check for connection pool exhaustion (max connections)

Cold Start Too Slow (Autoscale)

If cold starts exceed acceptable latency:
1. Check deployment type: Autoscale scales to zero
2. Options:
   a. Switch to Reserved VM (always-on, no cold starts)
   b. Set up external keep-alive (ping /health every 4 min)
   c. Optimize startup: lazy imports, defer DB connection
3. To switch:
   - Update .replit: deploymentTarget = "cloudrun"
   - Redeploy

Secrets Missing After Deploy

1. Open Secrets tab (lock icon in sidebar)
2. Verify all required secrets are present
3. Check Deployment Settings > Environment Variables
4. Secrets should auto-sync (2025+), but if not:
   - Remove and re-add the secret
   - Redeploy
5. For Account-level secrets:
   - Account Settings > Secrets
   - These apply to ALL Repls

Rollback Procedure

Replit supports one-click rollback to any previous deployment:

1. Deployment Settings > History
2. Find the last successful deployment
3. Click "Rollback to this version"
4. Verify health endpoint
5. Investigate root cause before redeploying fix

Rollback restores:
- Code at that deployment's commit
- Deployment configuration at that time
- Does NOT rollback database changes

Communication Templates

Internal (Slack)

P[1-4] INCIDENT: [App Name] on Replit
Status: INVESTIGATING / IDENTIFIED / MONITORING / RESOLVED
Impact: [What users are experiencing]
Cause: [If known]
Action: [What we're doing]
ETA: [When we expect resolution]
Next update: [Time]

External (Status Page)

[App Name] Service Disruption

We are experiencing issues with [specific feature/service].
[Describe user impact].

We have identified the cause and are working on a fix.
Estimated resolution: [time].

Last updated: [timestamp]

Post-Incident

Evidence Collection

set -euo pipefail
# Capture deployment logs
# Go to Deployment Settings > Logs > Copy relevant entries

# Capture timeline
echo "Timeline of events:" > incident-report.md
echo "- [time] Issue detected" >> incident-report.md
echo "- [time] Investigation started" >> incident-report.md
echo "- [time] Root cause identified" >> incident-report.md
echo "- [time] Fix deployed / rollback executed" >> incident-report.md
echo "- [time] Service restored" >> incident-report.md

Postmortem Template

## Incident: [Title]
**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** P[1-4]

### Summary
[1-2 sentence description of what happened]

### Root Cause
[Technical explanation]

### Timeline
- HH:MM — First alert
- HH:MM — Investigation started
- HH:MM — Root cause found
- HH:MM — Fix deployed / rollback
- HH:MM — Service restored

### Impact
- Users affected: [N]
- Downtime: [duration]

### Action Items
- [ ] [Prevention measure] — Owner — Due date

Error Handling

Issue	Cause	Solution
Can't access Workspace	Replit outage	Use status.replit.com, wait
Rollback not available	No previous deployments	Fix forward, deploy fix
Logs too short	Container restarted	Set up external log aggregator
DB rollback needed	Bad migration	Restore from Replit DB snapshot

Resources

Next Steps

For data handling patterns, see replit-data-handling.

More by jeremylongshore

View all skills by jeremylongshore →

svg-icon-generator

jeremylongshore

Svg Icon Generator - Auto-activating skill for Visual Content. Triggers on: svg icon generator, svg icon generator Part of the Visual Content skill category.

12844

automating-mobile-app-testing

jeremylongshore

This skill enables automated testing of mobile applications on iOS and Android platforms using frameworks like Appium, Detox, XCUITest, and Espresso. It generates end-to-end tests, sets up page object models, and handles platform-specific elements. Use this skill when the user requests mobile app testing, test automation for iOS or Android, or needs assistance with setting up device farms and simulators. The skill is triggered by terms like "mobile testing", "appium", "detox", "xcuitest", "espresso", "android test", "ios test".

22841

d2-diagram-creator

jeremylongshore

D2 Diagram Creator - Auto-activating skill for Visual Content. Triggers on: d2 diagram creator, d2 diagram creator Part of the Visual Content skill category.

11938

performing-penetration-testing

jeremylongshore

This skill enables automated penetration testing of web applications. It uses the penetration-tester plugin to identify vulnerabilities, including OWASP Top 10 threats, and suggests exploitation techniques. Use this skill when the user requests a "penetration test", "pentest", "vulnerability assessment", or asks to "exploit" a web application. It provides comprehensive reporting on identified security flaws.

5823

designing-database-schemas

jeremylongshore

Design and visualize efficient database schemas, normalize data, map relationships, and generate ERD diagrams and SQL statements.

12822

analyzing-logs

jeremylongshore

Analyze application logs to detect performance issues, identify error patterns, and improve stability by extracting key insights.

12314

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,7401,715

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,9071,523

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,8551,294

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

2,239983

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,746965

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,549831

Related MCP Servers

Browse all servers

Panther Labs

Integrate with Panther Labs to streamline cybersecurity workflows, manage detection rules, triage alerts, and boost inci

400 tools

Swagger/OpenAPI

Integrate Swagger/OpenAPI with your REST API to explore endpoints, fetch docs, and execute authenticated requests easily

70 tools

Blender

Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactiv

17,59521 tools

Desktop Commander MCP

Terminal control, file system search, and diff-based file editing for Claude and other AI assistants. Execute shell comm

5,6310 tools

Grafana

Safely connect cloud Grafana to AI agents with MCP: query, inspect, and manage Grafana resources using simple, focused o

2,4940 tools

Gemini CLI

Integrate with Gemini CLI for large-scale file analysis, secure code execution, and advanced context control using Googl

2,0396 tools

Install

mkdir -p .claude/skills/replit-incident-runbook && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8585" && unzip -o skill.zip -d .claude/skills/replit-incident-runbook && rm skill.zip

Installs to .claude/skills/replit-incident-runbook

Stats

Views

Installs

Author

jeremylongshore

7 skills published

Links

Source Code

replit-incident-runbook

Install

About this skill

Replit Incident Runbook

Overview

Prerequisites

Severity Levels

Quick Triage (First 5 Minutes)

Decision Tree

Remediation by Error Type

Deployment Crash (5xx / App Unreachable)

Database Connection Failure

Cold Start Too Slow (Autoscale)

Secrets Missing After Deploy

Rollback Procedure

Communication Templates

Internal (Slack)

External (Status Page)

Post-Incident

Evidence Collection

Postmortem Template

Error Handling

Resources

Next Steps

More by jeremylongshore

svg-icon-generator

automating-mobile-app-testing

d2-diagram-creator

performing-penetration-testing

designing-database-schemas

analyzing-logs

You might also like

ui-ux-pro-max

flutter-development

drawio-diagrams-enhanced

pdf-to-markdown

godot

nano-banana-pro

Related MCP Servers