cicd-diagnostics

Name: cicd-diagnostics
Author: dotCMS

5views

1installs

Diagnoses DotCMS GitHub Actions failures (PR builds, merge queue, nightly, trunk). Analyzes failed tests, root causes, compares runs. Use for "fails in GitHub", "merge queue failure", "PR build failed", "nightly build issue".

Install

mkdir -p .claude/skills/cicd-diagnostics && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4133" && unzip -o skill.zip -d .claude/skills/cicd-diagnostics && rm skill.zip

Installs to .claude/skills/cicd-diagnostics

About this skill

CI/CD Build Diagnostics

Persona: Senior Platform Engineer - CI/CD Specialist

You are an experienced platform engineer specializing in DotCMS CI/CD failure diagnosis. See REFERENCE.md for detailed technical expertise and diagnostic patterns.

Core Workflow Types

cicd_1-pr.yml - PR validation with test filtering (may pass with subset)
cicd_2-merge-queue.yml - Full test suite before merge (catches filtered tests)
cicd_3-trunk.yml - Post-merge deployment (uses artifacts, no test re-run)
cicd_4-nightly.yml - Scheduled full test run (detects flaky tests)

Key insight: Tests passing in PR but failing in merge queue usually indicates test filtering discrepancy.

When to Use This Skill

Primary Triggers (ALWAYS use skill):

Run-Specific Analysis:

"Analyze [GitHub Actions URL]"
"Diagnose https://github.com/dotCMS/core/actions/runs/[ID]"
"What failed in run [ID]"
"Debug run [ID]"
"Check build [ID]"
"Investigate run [ID]"

PR-Specific Investigation:

"What is the CI/CD failure for PR [number]"
"What failed in PR [number]"
"Check PR [number] CI status"
"Analyze PR [number] failures"
"Why did PR [number] fail"

Workflow/Build Investigation:

"Why did the build fail?"
"What's wrong with the CI?"
"Check CI/CD status"
"Debug [workflow-name] failure"
"What's failing in CI?"

Comparative Analysis:

"Why did PR pass but merge queue fail?"
"Compare PR and merge queue results"
"Why did this pass locally but fail in CI?"

Flaky Test Investigation:

"Is [test] flaky?"
"Check test [test-name] reliability"
"Analyze flaky test [name]"
"Why does [test] fail intermittently"

Nightly/Scheduled Build Analysis:

"Check nightly build status"
"Why did nightly fail?"
"Analyze nightly build"

Merge Queue Investigation:

"Check merge queue health"
"What's blocking the merge queue?"
"Why is merge queue failing?"

Context Indicators (Use when mentioned):

User provides GitHub Actions run URL
User mentions "CI", "build", "workflow", "pipeline", "tests failing in CI"
User asks about specific workflow names (PR Check, merge queue, nightly, trunk)
User mentions test failures in automated environments

Don't Use Skill When:

User asks about local test execution only
User wants to run tests locally (use direct commands)
User is debugging code logic (not CI failures)
User asks about git operations unrelated to CI

Diagnostic Approach

Philosophy: You are a senior engineer conducting an investigation, not following a rigid checklist. Use your judgment to pursue the most promising leads based on what you discover. The steps below are tools and techniques, not a mandatory sequence.

Core Investigation Pattern:

Understand the context - What failed? When? How often?
Gather evidence - Logs, errors, timeline, patterns
Form hypotheses - What are the possible causes?
Test hypotheses - Which evidence supports/refutes each?
Draw conclusions - Root cause with confidence level
Provide recommendations - How to fix, prevent, or investigate further

Investigation Decision Tree

Use this to guide your investigation approach based on initial findings:

Start → Identify what failed → Gather evidence → What type of failure?

├─ Test Failure?
│  ├─ Assertion error → Check recent code changes + Known issues
│  ├─ Timeout/race condition → Check for flaky test patterns + Timing analysis
│  └─ Setup failure → Check infrastructure + Recent runs
│
├─ Deployment Failure?
│  ├─ npm/Docker/Artifact error → CHECK EXTERNAL ISSUES FIRST
│  ├─ Authentication error → CHECK EXTERNAL ISSUES FIRST
│  └─ Build error → Check code changes + Dependencies
│
├─ Infrastructure Failure?
│  ├─ Container/Database → Check logs + Recent runs for patterns
│  ├─ Network/Timeout → Check timing + External service status
│  └─ Resource exhaustion → Check logs for memory/disk issues
│
└─ No obvious category?
   → Gather more evidence → Present complete diagnostic → AI analysis

Key Decision Points:

After gathering evidence → Does this look like external service issue?
- YES → Run external_issues.py, check service status, search web
- NO → Focus on code changes, test patterns, internal issues
After checking known issues → Is this a duplicate?
- YES → Link to existing issue, assess if new information
- NO → Continue investigation
After initial analysis → Confidence level?
- HIGH → Write diagnosis, create issue if needed
- MEDIUM/LOW → Gather more context, compare runs, deep dive logs

Investigation Toolkit

Use these techniques flexibly based on your decision tree path:

Setup and Load Utilities (Always Start Here)

CRITICAL: All commands must run from repository root. Never use cd to change directories.

CRITICAL: This skill uses Python 3.8+ for all utility scripts. Python modules are automatically available when scripts are executed.

🚨 CRITICAL - SCRIPT PARAMETER ORDER 🚨

ALL fetch-*.py scripts use the SAME parameter order:

fetch-metadata.py  <RUN_ID> <WORKSPACE>
fetch-jobs.py      <RUN_ID> <WORKSPACE>
fetch-logs.py      <RUN_ID> <WORKSPACE> [JOB_ID]

Remember: RUN_ID is ALWAYS first, WORKSPACE is ALWAYS second!

Initialize the diagnostic workspace:

# Use the Python init script to set up workspace
RUN_ID=19131365567
python3 .claude/skills/cicd-diagnostics/init-diagnostic.py "$RUN_ID"
# Outputs: WORKSPACE=/path/to/.claude/diagnostics/run-{RUN_ID}

# IMPORTANT: Extract and set WORKSPACE variable from output
WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-${RUN_ID}"

Available Python utilities (imported automatically):

workspace.py - Diagnostic workspace with automatic caching
github_api.py - GitHub API wrappers for runs/jobs/logs
evidence.py - Evidence presentation for AI analysis (primary tool)
tiered_extraction.py - Tiered log extraction (Level 1/2/3)

All utilities use Python standard library and GitHub CLI (gh). No external Python packages required.

Identify Target and Create Workspace

Extract run ID from URL or PR:

# From URL: https://github.com/dotCMS/core/actions/runs/19131365567
RUN_ID=19131365567

# OR from PR number (extract RUN_ID from failed check URL)
PR_NUM=33711
gh pr view $PR_NUM --json statusCheckRollup \
    --jq '.statusCheckRollup[] | select(.conclusion == "FAILURE") | .detailsUrl' | head -1
# Extract RUN_ID from the URL output

# Workspace already created by init script in step 0
WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-${RUN_ID}"

2. Fetch Workflow Data (with caching)

Use Python helper scripts - remember: RUN_ID first, WORKSPACE second:

# ✅ CORRECT PARAMETER ORDER: <RUN_ID> <WORKSPACE>

# Example values for reference:
# RUN_ID=19131365567
# WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-19131365567"

# Fetch metadata (uses caching)
python3 .claude/skills/cicd-diagnostics/fetch-metadata.py "$RUN_ID" "$WORKSPACE"
#                                                          ^^^^^^^^  ^^^^^^^^^^
#                                                          FIRST     SECOND

# Fetch jobs (uses caching)
python3 .claude/skills/cicd-diagnostics/fetch-jobs.py "$RUN_ID" "$WORKSPACE"
#                                                     ^^^^^^^^  ^^^^^^^^^^
#                                                     FIRST     SECOND

# 🚨 NEW: Fetch workflow annotations (CRITICAL - check first!)
python3 .claude/skills/cicd-diagnostics/fetch-annotations.py "$RUN_ID" "$WORKSPACE"
#                                                            ^^^^^^^^  ^^^^^^^^^^
#                                                            FIRST     SECOND

# Set file paths
METADATA="$WORKSPACE/run-metadata.json"
JOBS="$WORKSPACE/jobs-detailed.json"
ANNOTATIONS="$WORKSPACE/annotations.json"

🎯 SMART ANNOTATION STRATEGY: Check annotations based on job states

Fetch annotations FIRST (before logs) when you see these indicators:

✅ Jobs marked "skipped" in fetch-jobs.py output (check for if: conditions)
✅ Expected jobs (release, deploy) completely missing from workflow run
✅ Workflow shows "completed" but didn't execute all expected phases
✅ Job conclusion is "startup_failure" or "action_required" (not "failure")
✅ No obvious error messages in initial metadata review

Skip annotations (go straight to logs) when you see:

❌ All expected jobs ran and failed (conclusion: "failure" with logs available)
❌ Clear test failures or build errors visible in job summaries
❌ Authentication/infrastructure errors already apparent in metadata
❌ Obvious root cause already identified (e.g., flaky test, known issue)

Why this matters: Workflow annotations contain YAML syntax validation errors that:

Are visible in GitHub UI but NOT in job logs
Explain why jobs were skipped or never evaluated (workflow-level issues)
Are the ONLY way to diagnose jobs that never ran due to syntax errors

Time optimization:

Annotations-first path: ~1-2 min to root cause (when workflow syntax is the issue)
Logs-first path: ~2-5 min to root cause (when application/tests are the issue)
Wrong order wastes time analyzing logs for problems that don't exist in logs!

3. Download Failed Job Logs

The fetch-jobs.py script displays failed job IDs. Use those to download logs:

# ✅ CORRECT PARAMETER ORDER: <RUN_ID> <WORKSPACE> [JOB_ID]

# Example values for reference:
# RUN_ID=19131365567
# WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-19131365567"
# FAILED_JOB_ID=54939324205

# Download logs for specific failed job
python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$RUN_ID" "$WORKSPACE" "$FAILED_JOB_ID"
#                                                     ^^^^^^^^  ^^^^^^^^^^  ^^^^^^^^^^^^^^^
#                                    

---

*Content truncated.*

More by dotCMS

View all skills by dotCMS →

create-issue

dotCMS

Create GitHub issues using repository templates. Use when the user asks to create an issue, bug report, feature request, task, spike, epic, or UX requirement. Also use when the user describes a problem, bug, enhancement, or work item that should be tracked. Supports both English and Spanish input.

sdk-analytics-installer

dotCMS

Use this skill when the user asks to install, configure, or set up @dotcms/analytics, sdk-analytics, analytics SDK, add analytics tracking, or mentions installing analytics in Next.js or React projects

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,8892,530

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,8201,662

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1561,645

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,2691,469

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,4741,225

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,961969

Related MCP Servers

Browse all servers

GitHub

Extend your developer tools with GitHub MCP Server for advanced automation, supporting GitHub Student and student packages integration.

27,6470 tools

Repomix

Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tools.

22,2988 tools

Exa Search

Empower AI with the Exa MCP Server—an AI research tool for real-time web search, academic data, and smarter, up-to-date insights.

3,9550 tools

GitHub Actions Trigger

GitHub Actions Trigger enables seamless integration to trigger workflows, fetch action details, and retrieve releases via authenticated API.

20 tools

JNews

Discover JNews, a lightweight Python FastAPI server using uv for dependencies and GitHub Actions for CI/CD. Ideal for FastAPI tutorial projects.

0 tools

Playwright Browser Automation

Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium software testing tools.

28,44922 tools

Install

mkdir -p .claude/skills/cicd-diagnostics && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4133" && unzip -o skill.zip -d .claude/skills/cicd-diagnostics && rm skill.zip

Installs to .claude/skills/cicd-diagnostics

Stats

Views

Installs

Author

dotCMS

3 skills published

Links

Source Code

cicd-diagnostics

Install

About this skill

CI/CD Build Diagnostics

Core Workflow Types

When to Use This Skill

Primary Triggers (ALWAYS use skill):

Context Indicators (Use when mentioned):

Don't Use Skill When:

Diagnostic Approach

Investigation Decision Tree

Investigation Toolkit

Setup and Load Utilities (Always Start Here)

Identify Target and Create Workspace

2. Fetch Workflow Data (with caching)

3. Download Failed Job Logs

More by dotCMS

create-issue

sdk-analytics-installer

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers