1
0
Source

Interactive debugging for failed e2e tests. Orchestrates the debugging session but delegates log reading to subagents to keep the main conversation clean. Use for ping-pong debugging sessions where you want to form and test hypotheses together with the user.

Install

mkdir -p .claude/skills/debug-e2e && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4310" && unzip -o skill.zip -d .claude/skills/debug-e2e && rm skill.zip

Installs to .claude/skills/debug-e2e

About this skill

E2E Test Debugging

Interactive debugging for failed e2e tests. This skill orchestrates the debugging session but never reads logs directly - it delegates to subagents to keep the conversation context clean.

Invocation

The user can invoke this skill with:

  • CI log hash: /debug-e2e 343c52b17688d2cd
  • PR number: /debug-e2e #19783 or /debug-e2e 19783
  • CI URL: /debug-e2e http://ci.aztec-labs.com/...
  • Test name: /debug-e2e epochs_l1_reorgs (for general investigation)
  • No argument: /debug-e2e then ask the user what they want to debug

When to Use

  • Debugging flaky or failing e2e tests
  • Investigating CI failures that need deep analysis
  • When you want to collaborate with the user on forming hypotheses
  • When comparing failed and successful runs

When NOT to Use

  • Obvious assertion failures: If the test output clearly shows expected 5, got 3, just investigate the code directly
  • Build/compilation errors: Use standard debugging, not log analysis
  • Simple configuration issues: Missing env vars, wrong paths, etc.
  • When user just wants a quick answer: This skill is for interactive ping-pong debugging sessions

Key Principle

Never read logs directly in this conversation. Logs can be 50k+ lines and would pollute the context. Instead:

  1. Use identify-ci-failures subagent to find failures and download logs
  2. Use analyze-logs subagent to deep-dive specific logs
  3. Work with the summaries they return

Workflow

Step 1: Identify Failures

Spawn the identify-ci-failures subagent:

Use Task tool with subagent_type: "identify-ci-failures"
Prompt: "Identify CI failures for [PR number / CI URL / hash]"

This returns:

  • List of failures with types
  • Local file paths for downloaded logs (e.g., /tmp/<hash>.log)
  • History URL for finding successful runs

Step 2: Discuss with User

Present findings to the user:

  • What tests failed?
  • What type of failure (timeout, assertion, error)?
  • Form initial hypotheses together

Step 3: Deep Dive with analyze-logs

Spawn the analyze-logs subagent with the local file path:

Use Task tool with subagent_type: "analyze-logs"
Prompt: "Analyze /tmp/<hash>.log focusing on test '<test_name>'. Look for [specific thing based on hypothesis]"

For comparison:

Prompt: "Compare /tmp/<failed>.log with /tmp/<success>.log for test '<test_name>'. Find divergence points."

Step 4: Refine Hypothesis

Based on the summary:

  • Does the evidence support the hypothesis?
  • What contradicts it?
  • What new questions arise?

Discuss with user, then spawn another analyze-logs if needed.

Step 5: Investigate Codebase

Once you have a theory, search the codebase:

  • Use Grep to find where specific log messages are generated
  • Read the code context around log emission points
  • Trace execution paths

Step 6: Suggest Fix or Local Test

Either:

  • Propose a code fix based on findings
  • Suggest running the test locally to verify:
    yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
    

Hypothesis Formation

Take time to think deeply before proposing theories.

For each hypothesis:

  1. Clearly state the theory: "The test fails because X happens when Y"
  2. Identify expected evidence: "If this is correct, we should see log entries for Z"
  3. Ask analyze-logs to verify: Spawn subagent to look for specific evidence
  4. Look for contradictions: What would disprove this theory?
  5. Assign confidence: high / medium / low based on evidence

Formulate multiple competing hypotheses when the cause is unclear.

Investigation Principles

  • Be systematic: Follow the workflow, don't jump to conclusions
  • Be evidence-based: Every theory must be backed by log entries or code
  • Be critical: Actively seek to disprove your own hypotheses
  • Be thorough: Check timing, sequence, missing events, code context
  • Be clear: Use specific timestamps and quotes from summaries
  • Be practical: Suggest fixes that address root causes

History Investigation

To understand when a test started failing:

  1. Look for the history: marker at the beginning of the log file (first few lines)
  2. The history shows recent runs of this exact test with PASSED/FAILED/FLAKED status:
    01-23 17:10:11: PASSED (2614d91ec48f4047): ... (Author: commit message (#PR))
    01-23 17:08:30: FLAKED (10d5f47f04025f1c): ... (code: 1) group:e2e-p2p-epoch-flakes (Author: commit message (#PR))
    01-23 16:51:21: FLAKED (512e978edff9e471): ... (code: 1) group:e2e-p2p-epoch-flakes (Author: commit message (#PR))
    
  3. Identify the transition point where test started failing/flaking
  4. Check the PR mentioned in the commit message to understand what changed
  5. Download logs from both passing and failing runs to compare:
    • Use hash from history (e.g., 2614d91ec48f4047 for passed, 10d5f47f04025f1c for failed)
    • yarn ci dlog <hash> > /tmp/<hash>.log 2>&1 downloads the log to a local tmp file

Important: Do NOT use gh run list - the history in the log file is more accurate for this specific test.

Local Test Running

To run tests locally for verification:

# Run specific test
yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

# With verbose logging
LOG_LEVEL=verbose yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

# With debug logging (very detailed)
LOG_LEVEL=debug yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

# With specific module logging
LOG_LEVEL='info; debug:sequencer,p2p' yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

Log Structure

Timestamp Format

Logs use ISO timestamps: 2024-01-23T17:08:30.123Z - useful for correlating events across nodes.

Log Levels

  • ERROR - Failures, exceptions
  • WARN - Potential issues, recoverable problems
  • INFO - Key events, state transitions
  • VERBOSE - Detailed operational info
  • DEBUG - Fine-grained debugging (very noisy)

Component Prefixes

Log lines are prefixed with the component name (e.g., aztec:sequencer, aztec:p2p, aztec:archiver). These map to the Key Packages section in CLAUDE.md - use that as a reference for understanding what each component does.

Multi-Node Debugging

E2E tests often spawn multiple nodes. Key tips:

Identifying Nodes

  • Look for node identifiers in log prefixes: node-0, node-1, validator-0, etc.
  • Each node has its own log stream but they're interleaved in the combined output
  • Ask analyze-logs to filter by node when needed

Cross-Node Correlation

  • Use timestamps to correlate events across nodes
  • Look for message propagation: "Node A sends X" → "Node B receives X"
  • Check for missing events: if Node A sent but Node B never received, that's a clue

Common Multi-Node Issues

  • Network partition: Nodes can't reach each other
  • Clock skew: Timestamps don't align, causing validation failures
  • Split brain: Nodes have divergent views of state
  • Message ordering: Events arrive in unexpected order

Common Failure Patterns

Timeout Failures

  • Action executed, expected reaction didn't occur
  • Ask analyze-logs to find: what was the last action? What was expected?
  • Check for blocked operations, missing events, stuck processes

Assertion Failures

  • Test expectation not met
  • Compare actual vs expected values
  • Trace back to find where the wrong value originated

State Pollution

  • Previous test left system in bad state
  • Compare logs from the failing test with the same test in isolation
  • Check beforeEach/afterEach hooks

Race Conditions

  • Timing-dependent failures
  • Look for operations completing in different order
  • Check for missing synchronization

Example Session

User: Debug this flaky test: 343c52b17688d2cd

Claude: [Spawns identify-ci-failures subagent]
        Returns: epochs_l1_reorgs.test.ts failed with timeout after 383s
        Log downloaded to /tmp/343c52b17688d2cd.log
        History shows test started flaking recently

Claude: The test "handles reorgs spanning epoch boundaries" timed out. Looking at
        the history, this test has been flaking since yesterday. The failures correlate
        with PR #19783 "adding MAXIMUM_GOSSIP_CLOCK_DISPARITY for p2p validation".

        Let me check a passing run to compare. [Spawns analyze-logs for both]

User: What do you see as differences?

Claude: [Spawns analyze-logs with comparison prompt]
        Summary shows: In failed run, node-2 never received epoch proof notification.
        In passing run, all nodes received it within 2s of each other.

        Hypothesis: The new clock disparity check may be rejecting valid messages
        during epoch transitions when clocks drift slightly.

User: Can you check if there are any logs about clock validation?

Claude: [Spawns analyze-logs with specific prompt]
        Found: "Rejecting gossip message: clock disparity 3.2s exceeds maximum 2s"
        This confirms the hypothesis.

Claude: [Searches codebase for MAXIMUM_GOSSIP_CLOCK_DISPARITY]
        Found in p2p/config.ts, default is 2 seconds. The PR added this validation
        but epoch proofs can take longer to propagate. Recommend increasing the
        disparity threshold for epoch-related messages.

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

641968

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

590705

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

339397

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

318395

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

450339

fastapi-templates

wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

304231

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.