Install
mkdir -p .claude/skills/incident-commander && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4080" && unzip -o skill.zip -d .claude/skills/incident-commander && rm skill.zipInstalls to .claude/skills/incident-commander
About this skill
Incident Commander Skill
Category: Engineering Team
Tier: POWERFUL
Author: Claude Skills Team
Version: 1.0.0
Last Updated: February 2026
Overview
The Incident Commander skill provides a comprehensive incident response framework for managing technology incidents from detection through resolution and post-incident review. This skill implements battle-tested practices from SRE and DevOps teams at scale, providing structured tools for severity classification, timeline reconstruction, and thorough post-incident analysis.
Key Features
- Automated Severity Classification - Intelligent incident triage based on impact and urgency metrics
- Timeline Reconstruction - Transform scattered logs and events into coherent incident narratives
- Post-Incident Review Generation - Structured PIRs with multiple RCA frameworks
- Communication Templates - Pre-built templates for stakeholder updates and escalations
- Runbook Integration - Generate actionable runbooks from incident patterns
Skills Included
Core Tools
-
Incident Classifier (
incident_classifier.py)- Analyzes incident descriptions and outputs severity levels
- Recommends response teams and initial actions
- Generates communication templates based on severity
-
Timeline Reconstructor (
timeline_reconstructor.py)- Processes timestamped events from multiple sources
- Reconstructs chronological incident timeline
- Identifies gaps and provides duration analysis
-
PIR Generator (
pir_generator.py)- Creates comprehensive Post-Incident Review documents
- Applies multiple RCA frameworks (5 Whys, Fishbone, Timeline)
- Generates actionable follow-up items
Incident Response Framework
Severity Classification System
SEV1 - Critical Outage
Definition: Complete service failure affecting all users or critical business functions
Characteristics:
- Customer-facing services completely unavailable
- Data loss or corruption affecting users
- Security breaches with customer data exposure
- Revenue-generating systems down
- SLA violations with financial penalties
Response Requirements:
- Immediate escalation to on-call engineer
- Incident Commander assigned within 5 minutes
- Executive notification within 15 minutes
- Public status page update within 15 minutes
- War room established
- All hands on deck if needed
Communication Frequency: Every 15 minutes until resolution
SEV2 - Major Impact
Definition: Significant degradation affecting subset of users or non-critical functions
Characteristics:
- Partial service degradation (>25% of users affected)
- Performance issues causing user frustration
- Non-critical features unavailable
- Internal tools impacting productivity
- Data inconsistencies not affecting user experience
Response Requirements:
- On-call engineer response within 15 minutes
- Incident Commander assigned within 30 minutes
- Status page update within 30 minutes
- Stakeholder notification within 1 hour
- Regular team updates
Communication Frequency: Every 30 minutes during active response
SEV3 - Minor Impact
Definition: Limited impact with workarounds available
Characteristics:
- Single feature or component affected
- <25% of users impacted
- Workarounds available
- Performance degradation not significantly impacting UX
- Non-urgent monitoring alerts
Response Requirements:
- Response within 2 hours during business hours
- Next business day response acceptable outside hours
- Internal team notification
- Optional status page update
Communication Frequency: At key milestones only
SEV4 - Low Impact
Definition: Minimal impact, cosmetic issues, or planned maintenance
Characteristics:
- Cosmetic bugs
- Documentation issues
- Logging or monitoring gaps
- Performance issues with no user impact
- Development/test environment issues
Response Requirements:
- Response within 1-2 business days
- Standard ticket/issue tracking
- No special escalation required
Communication Frequency: Standard development cycle updates
Incident Commander Role
Primary Responsibilities
-
Command and Control
- Own the incident response process
- Make critical decisions about resource allocation
- Coordinate between technical teams and stakeholders
- Maintain situational awareness across all response streams
-
Communication Hub
- Provide regular updates to stakeholders
- Manage external communications (status pages, customer notifications)
- Facilitate effective communication between response teams
- Shield responders from external distractions
-
Process Management
- Ensure proper incident tracking and documentation
- Drive toward resolution while maintaining quality
- Coordinate handoffs between team members
- Plan and execute rollback strategies if needed
-
Post-Incident Leadership
- Ensure thorough post-incident reviews are conducted
- Drive implementation of preventive measures
- Share learnings with broader organization
Decision-Making Framework
Emergency Decisions (SEV1/2):
- Incident Commander has full authority
- Bias toward action over analysis
- Document decisions for later review
- Consult subject matter experts but don't get blocked
Resource Allocation:
- Can pull in any necessary team members
- Authority to escalate to senior leadership
- Can approve emergency spend for external resources
- Make call on communication channels and timing
Technical Decisions:
- Lean on technical leads for implementation details
- Make final calls on trade-offs between speed and risk
- Approve rollback vs. fix-forward strategies
- Coordinate testing and validation approaches
Communication Templates
Initial Incident Notification (SEV1/2)
Subject: [SEV{severity}] {Service Name} - {Brief Description}
Incident Details:
- Start Time: {timestamp}
- Severity: SEV{level}
- Impact: {user impact description}
- Current Status: {investigating/mitigating/resolved}
Technical Details:
- Affected Services: {service list}
- Symptoms: {what users are experiencing}
- Initial Assessment: {suspected root cause if known}
Response Team:
- Incident Commander: {name}
- Technical Lead: {name}
- SMEs Engaged: {list}
Next Update: {timestamp}
Status Page: {link}
War Room: {bridge/chat link}
---
{Incident Commander Name}
{Contact Information}
Executive Summary (SEV1)
Subject: URGENT - Customer-Impacting Outage - {Service Name}
Executive Summary:
{2-3 sentence description of customer impact and business implications}
Key Metrics:
- Time to Detection: {X minutes}
- Time to Engagement: {X minutes}
- Estimated Customer Impact: {number/percentage}
- Current Status: {status}
- ETA to Resolution: {time or "investigating"}
Leadership Actions Required:
- [ ] Customer communication approval
- [ ] PR/Communications coordination
- [ ] Resource allocation decisions
- [ ] External vendor engagement
Incident Commander: {name} ({contact})
Next Update: {time}
---
This is an automated alert from our incident response system.
Customer Communication Template
We are currently experiencing {brief description of issue} affecting {scope of impact}.
Our engineering team was alerted at {time} and is actively working to resolve the issue. We will provide updates every {frequency} until resolved.
What we know:
- {factual statement of impact}
- {factual statement of scope}
- {brief status of response}
What we're doing:
- {primary response action}
- {secondary response action}
Workaround (if available):
{workaround steps or "No workaround currently available"}
We apologize for the inconvenience and will share more information as it becomes available.
Next update: {time}
Status page: {link}
Stakeholder Management
Stakeholder Classification
Internal Stakeholders:
- Engineering Leadership - Technical decisions and resource allocation
- Product Management - Customer impact assessment and feature implications
- Customer Support - User communication and support ticket management
- Sales/Account Management - Customer relationship management for enterprise clients
- Executive Team - Business impact decisions and external communication approval
- Legal/Compliance - Regulatory reporting and liability assessment
External Stakeholders:
- Customers - Service availability and impact communication
- Partners - API availability and integration impacts
- Vendors - Third-party service dependencies and support escalation
- Regulators - Compliance reporting for regulated industries
- Public/Media - Transparency for public-facing outages
Communication Cadence by Stakeholder
| Stakeholder | SEV1 | SEV2 | SEV3 | SEV4 |
|---|---|---|---|---|
| Engineering Leadership | Real-time | 30min | 4hrs | Daily |
| Executive Team | 15min | 1hr | EOD | Weekly |
| Customer Support | Real-time | 30min | 2hrs | As needed |
| Customers | 15min | 1hr | Optional | None |
| Partners | 30min | 2hrs | Optional | None |
Runbook Generation Framework
Dynamic Runbook Components
-
Detection Playbooks
- Monitoring alert definitions
- Triage decision trees
- Escalation trigger points
- Initial response actions
-
Response Playbooks
- Step-by-step mitigation procedures
- Rollback instructions
- Validation checkpoints
- Communication checkpoints
-
Recovery Playbooks
- Service restoration procedures
- Data consistency checks
- Performance validation
- User notification processes
Runbook Template Structure
# {Service/Component} Incident Response Runbook
## Quick Reference
- **Severity Indicators:** {list of conditions for each severity level}
- **Key Contacts:** {on-call rotations and escalation paths}
- **Critical Commands:** {list of emergency commands with descriptions
---
*Content truncated.*
More by alirezarezvani
View all skills by alirezarezvani →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversTerminal control, file system search, and diff-based file editing for Claude and other AI assistants. Execute shell comm
Desktop Commander MCP unifies code management with advanced source control, git, and svn support—streamlining developmen
Safely connect cloud Grafana to AI agents with MCP: query, inspect, and manage Grafana resources using simple, focused o
Integrate Datadog monitor for streamlined incident management. List and get incident info to enhance your observability
Connect with CrowdStrike Falcon, a leading endpoint protection platform, for intelligent security analysis and advanced
Integrate with Datadog for real-time metrics, logs, dashboards, and APM to optimize DevOps workflows. Learn about Datado
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.