incident-commander

1
0
Source

Incident Commander Skill

Install

mkdir -p .claude/skills/incident-commander && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4080" && unzip -o skill.zip -d .claude/skills/incident-commander && rm skill.zip

Installs to .claude/skills/incident-commander

About this skill

Incident Commander Skill

Category: Engineering Team
Tier: POWERFUL
Author: Claude Skills Team
Version: 1.0.0
Last Updated: February 2026

Overview

The Incident Commander skill provides a comprehensive incident response framework for managing technology incidents from detection through resolution and post-incident review. This skill implements battle-tested practices from SRE and DevOps teams at scale, providing structured tools for severity classification, timeline reconstruction, and thorough post-incident analysis.

Key Features

  • Automated Severity Classification - Intelligent incident triage based on impact and urgency metrics
  • Timeline Reconstruction - Transform scattered logs and events into coherent incident narratives
  • Post-Incident Review Generation - Structured PIRs with multiple RCA frameworks
  • Communication Templates - Pre-built templates for stakeholder updates and escalations
  • Runbook Integration - Generate actionable runbooks from incident patterns

Skills Included

Core Tools

  1. Incident Classifier (incident_classifier.py)

    • Analyzes incident descriptions and outputs severity levels
    • Recommends response teams and initial actions
    • Generates communication templates based on severity
  2. Timeline Reconstructor (timeline_reconstructor.py)

    • Processes timestamped events from multiple sources
    • Reconstructs chronological incident timeline
    • Identifies gaps and provides duration analysis
  3. PIR Generator (pir_generator.py)

    • Creates comprehensive Post-Incident Review documents
    • Applies multiple RCA frameworks (5 Whys, Fishbone, Timeline)
    • Generates actionable follow-up items

Incident Response Framework

Severity Classification System

SEV1 - Critical Outage

Definition: Complete service failure affecting all users or critical business functions

Characteristics:

  • Customer-facing services completely unavailable
  • Data loss or corruption affecting users
  • Security breaches with customer data exposure
  • Revenue-generating systems down
  • SLA violations with financial penalties

Response Requirements:

  • Immediate escalation to on-call engineer
  • Incident Commander assigned within 5 minutes
  • Executive notification within 15 minutes
  • Public status page update within 15 minutes
  • War room established
  • All hands on deck if needed

Communication Frequency: Every 15 minutes until resolution

SEV2 - Major Impact

Definition: Significant degradation affecting subset of users or non-critical functions

Characteristics:

  • Partial service degradation (>25% of users affected)
  • Performance issues causing user frustration
  • Non-critical features unavailable
  • Internal tools impacting productivity
  • Data inconsistencies not affecting user experience

Response Requirements:

  • On-call engineer response within 15 minutes
  • Incident Commander assigned within 30 minutes
  • Status page update within 30 minutes
  • Stakeholder notification within 1 hour
  • Regular team updates

Communication Frequency: Every 30 minutes during active response

SEV3 - Minor Impact

Definition: Limited impact with workarounds available

Characteristics:

  • Single feature or component affected
  • <25% of users impacted
  • Workarounds available
  • Performance degradation not significantly impacting UX
  • Non-urgent monitoring alerts

Response Requirements:

  • Response within 2 hours during business hours
  • Next business day response acceptable outside hours
  • Internal team notification
  • Optional status page update

Communication Frequency: At key milestones only

SEV4 - Low Impact

Definition: Minimal impact, cosmetic issues, or planned maintenance

Characteristics:

  • Cosmetic bugs
  • Documentation issues
  • Logging or monitoring gaps
  • Performance issues with no user impact
  • Development/test environment issues

Response Requirements:

  • Response within 1-2 business days
  • Standard ticket/issue tracking
  • No special escalation required

Communication Frequency: Standard development cycle updates

Incident Commander Role

Primary Responsibilities

  1. Command and Control

    • Own the incident response process
    • Make critical decisions about resource allocation
    • Coordinate between technical teams and stakeholders
    • Maintain situational awareness across all response streams
  2. Communication Hub

    • Provide regular updates to stakeholders
    • Manage external communications (status pages, customer notifications)
    • Facilitate effective communication between response teams
    • Shield responders from external distractions
  3. Process Management

    • Ensure proper incident tracking and documentation
    • Drive toward resolution while maintaining quality
    • Coordinate handoffs between team members
    • Plan and execute rollback strategies if needed
  4. Post-Incident Leadership

    • Ensure thorough post-incident reviews are conducted
    • Drive implementation of preventive measures
    • Share learnings with broader organization

Decision-Making Framework

Emergency Decisions (SEV1/2):

  • Incident Commander has full authority
  • Bias toward action over analysis
  • Document decisions for later review
  • Consult subject matter experts but don't get blocked

Resource Allocation:

  • Can pull in any necessary team members
  • Authority to escalate to senior leadership
  • Can approve emergency spend for external resources
  • Make call on communication channels and timing

Technical Decisions:

  • Lean on technical leads for implementation details
  • Make final calls on trade-offs between speed and risk
  • Approve rollback vs. fix-forward strategies
  • Coordinate testing and validation approaches

Communication Templates

Initial Incident Notification (SEV1/2)

Subject: [SEV{severity}] {Service Name} - {Brief Description}

Incident Details:
- Start Time: {timestamp}
- Severity: SEV{level}
- Impact: {user impact description}
- Current Status: {investigating/mitigating/resolved}

Technical Details:
- Affected Services: {service list}
- Symptoms: {what users are experiencing}
- Initial Assessment: {suspected root cause if known}

Response Team:
- Incident Commander: {name}
- Technical Lead: {name}
- SMEs Engaged: {list}

Next Update: {timestamp}
Status Page: {link}
War Room: {bridge/chat link}

---
{Incident Commander Name}
{Contact Information}

Executive Summary (SEV1)

Subject: URGENT - Customer-Impacting Outage - {Service Name}

Executive Summary:
{2-3 sentence description of customer impact and business implications}

Key Metrics:
- Time to Detection: {X minutes}
- Time to Engagement: {X minutes} 
- Estimated Customer Impact: {number/percentage}
- Current Status: {status}
- ETA to Resolution: {time or "investigating"}

Leadership Actions Required:
- [ ] Customer communication approval
- [ ] PR/Communications coordination  
- [ ] Resource allocation decisions
- [ ] External vendor engagement

Incident Commander: {name} ({contact})
Next Update: {time}

---
This is an automated alert from our incident response system.

Customer Communication Template

We are currently experiencing {brief description of issue} affecting {scope of impact}. 

Our engineering team was alerted at {time} and is actively working to resolve the issue. We will provide updates every {frequency} until resolved.

What we know:
- {factual statement of impact}
- {factual statement of scope}
- {brief status of response}

What we're doing:
- {primary response action}
- {secondary response action}

Workaround (if available):
{workaround steps or "No workaround currently available"}

We apologize for the inconvenience and will share more information as it becomes available.

Next update: {time}
Status page: {link}

Stakeholder Management

Stakeholder Classification

Internal Stakeholders:

  • Engineering Leadership - Technical decisions and resource allocation
  • Product Management - Customer impact assessment and feature implications
  • Customer Support - User communication and support ticket management
  • Sales/Account Management - Customer relationship management for enterprise clients
  • Executive Team - Business impact decisions and external communication approval
  • Legal/Compliance - Regulatory reporting and liability assessment

External Stakeholders:

  • Customers - Service availability and impact communication
  • Partners - API availability and integration impacts
  • Vendors - Third-party service dependencies and support escalation
  • Regulators - Compliance reporting for regulated industries
  • Public/Media - Transparency for public-facing outages

Communication Cadence by Stakeholder

StakeholderSEV1SEV2SEV3SEV4
Engineering LeadershipReal-time30min4hrsDaily
Executive Team15min1hrEODWeekly
Customer SupportReal-time30min2hrsAs needed
Customers15min1hrOptionalNone
Partners30min2hrsOptionalNone

Runbook Generation Framework

Dynamic Runbook Components

  1. Detection Playbooks

    • Monitoring alert definitions
    • Triage decision trees
    • Escalation trigger points
    • Initial response actions
  2. Response Playbooks

    • Step-by-step mitigation procedures
    • Rollback instructions
    • Validation checkpoints
    • Communication checkpoints
  3. Recovery Playbooks

    • Service restoration procedures
    • Data consistency checks
    • Performance validation
    • User notification processes

Runbook Template Structure

# {Service/Component} Incident Response Runbook

## Quick Reference
- **Severity Indicators:** {list of conditions for each severity level}
- **Key Contacts:** {on-call rotations and escalation paths}
- **Critical Commands:** {list of emergency commands with descriptions

---

*Content truncated.*

senior-architect

alirezarezvani

Comprehensive software architecture skill for designing scalable, maintainable systems using ReactJS, NextJS, NodeJS, Express, React Native, Swift, Kotlin, Flutter, Postgres, GraphQL, Go, Python. Includes architecture diagram generation, system design patterns, tech stack decision frameworks, and dependency analysis. Use when designing system architecture, making technical decisions, creating architecture diagrams, evaluating trade-offs, or defining integration patterns.

170129

content-creator

alirezarezvani

Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content calendars, or when user mentions content creation, brand voice, SEO optimization, social media marketing, or content strategy.

11619

ceo-advisor

alirezarezvani

Executive leadership guidance for strategic decision-making, organizational development, and stakeholder management. Includes strategy analyzer, financial scenario modeling, board governance frameworks, and investor relations playbooks. Use when planning strategy, preparing board presentations, managing investors, developing organizational culture, making executive decisions, or when user mentions CEO, strategic planning, board meetings, investor updates, organizational leadership, or executive strategy.

8413

content-trend-researcher

alirezarezvani

Advanced content and topic research skill that analyzes trends across Google Analytics, Google Trends, Substack, Medium, Reddit, LinkedIn, X, blogs, podcasts, and YouTube to generate data-driven article outlines based on user intent analysis

10913

cold-email

alirezarezvani

When the user wants to write, improve, or build a sequence of B2B cold outreach emails to prospects who haven't asked to hear from them. Use when the user mentions 'cold email,' 'cold outreach,' 'prospecting emails,' 'SDR emails,' 'sales emails,' 'first touch email,' 'follow-up sequence,' or 'email prospecting.' Also use when they share an email draft that sounds too sales-y and needs to be humanized. Distinct from email-sequence (lifecycle/nurture to opted-in subscribers) — this is unsolicited outreach to new prospects. NOT for lifecycle emails, newsletters, or drip campaigns (use email-sequence).

3713

content-humanizer

alirezarezvani

Makes AI-generated content sound genuinely human — not just cleaned up, but alive. Use when content feels robotic, uses too many AI clichés, lacks personality, or reads like it was written by committee. Triggers: 'this sounds like AI', 'make it more human', 'add personality', 'it feels generic', 'sounds robotic', 'fix AI writing', 'inject our voice'. NOT for initial content creation (use content-production). NOT for SEO optimization (use content-production Mode 3).

359

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

643969

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

591705

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

318399

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

340397

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

452339

fastapi-templates

wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

304231

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.