debug-e2e

6views

1installs

Interactive debugging for failed e2e tests. Orchestrates the debugging session but delegates log reading to subagents to keep the main conversation clean. Use for ping-pong debugging sessions where you want to form and test hypotheses together with the user.

Install

mkdir -p .claude/skills/debug-e2e && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4310" && unzip -o skill.zip -d .claude/skills/debug-e2e && rm skill.zip

Installs to .claude/skills/debug-e2e

About this skill

E2E Test Debugging

Interactive debugging for failed e2e tests. This skill orchestrates the debugging session but never reads logs directly - it delegates to subagents to keep the conversation context clean.

Invocation

The user can invoke this skill with:

CI log hash: /debug-e2e 343c52b17688d2cd
PR number: /debug-e2e #19783 or /debug-e2e 19783
CI URL: /debug-e2e http://ci.aztec-labs.com/...
Test name: /debug-e2e epochs_l1_reorgs (for general investigation)
No argument: /debug-e2e then ask the user what they want to debug

When to Use

Debugging flaky or failing e2e tests
Investigating CI failures that need deep analysis
When you want to collaborate with the user on forming hypotheses
When comparing failed and successful runs

When NOT to Use

Obvious assertion failures: If the test output clearly shows expected 5, got 3, just investigate the code directly
Build/compilation errors: Use standard debugging, not log analysis
Simple configuration issues: Missing env vars, wrong paths, etc.
When user just wants a quick answer: This skill is for interactive ping-pong debugging sessions

Key Principle

Never read logs directly in this conversation. Logs can be 50k+ lines and would pollute the context. Instead:

Use identify-ci-failures subagent to find failures and download logs
Use analyze-logs subagent to deep-dive specific logs
Work with the summaries they return

Workflow

Step 1: Identify Failures

Spawn the identify-ci-failures subagent:

Use Task tool with subagent_type: "identify-ci-failures"
Prompt: "Identify CI failures for [PR number / CI URL / hash]"

This returns:

List of failures with types
Local file paths for downloaded logs (e.g., /tmp/<hash>.log)
History URL for finding successful runs

Step 2: Discuss with User

Present findings to the user:

What tests failed?
What type of failure (timeout, assertion, error)?
Form initial hypotheses together

Step 3: Deep Dive with analyze-logs

Spawn the analyze-logs subagent with the local file path:

Use Task tool with subagent_type: "analyze-logs"
Prompt: "Analyze /tmp/<hash>.log focusing on test '<test_name>'. Look for [specific thing based on hypothesis]"

For comparison:

Prompt: "Compare /tmp/<failed>.log with /tmp/<success>.log for test '<test_name>'. Find divergence points."

Step 4: Refine Hypothesis

Based on the summary:

Does the evidence support the hypothesis?
What contradicts it?
What new questions arise?

Discuss with user, then spawn another analyze-logs if needed.

Step 5: Investigate Codebase

Once you have a theory, search the codebase:

Use Grep to find where specific log messages are generated
Read the code context around log emission points
Trace execution paths

Step 6: Suggest Fix or Local Test

Either:

Propose a code fix based on findings

Suggest running the test locally to verify:

yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

Hypothesis Formation

Take time to think deeply before proposing theories.

For each hypothesis:

Clearly state the theory: "The test fails because X happens when Y"
Identify expected evidence: "If this is correct, we should see log entries for Z"
Ask analyze-logs to verify: Spawn subagent to look for specific evidence
Look for contradictions: What would disprove this theory?
Assign confidence: high / medium / low based on evidence

Formulate multiple competing hypotheses when the cause is unclear.

Investigation Principles

Be systematic: Follow the workflow, don't jump to conclusions
Be evidence-based: Every theory must be backed by log entries or code
Be critical: Actively seek to disprove your own hypotheses
Be thorough: Check timing, sequence, missing events, code context
Be clear: Use specific timestamps and quotes from summaries
Be practical: Suggest fixes that address root causes

History Investigation

To understand when a test started failing:

Look for the history: marker at the beginning of the log file (first few lines)

The history shows recent runs of this exact test with PASSED/FAILED/FLAKED status:

01-23 17:10:11: PASSED (2614d91ec48f4047): ... (Author: commit message (#PR))
01-23 17:08:30: FLAKED (10d5f47f04025f1c): ... (code: 1) group:e2e-p2p-epoch-flakes (Author: commit message (#PR))
01-23 16:51:21: FLAKED (512e978edff9e471): ... (code: 1) group:e2e-p2p-epoch-flakes (Author: commit message (#PR))

Identify the transition point where test started failing/flaking
Check the PR mentioned in the commit message to understand what changed
Download logs from both passing and failing runs to compare:
- Use hash from history (e.g., 2614d91ec48f4047 for passed, 10d5f47f04025f1c for failed)
- yarn ci dlog <hash> > /tmp/<hash>.log 2>&1 downloads the log to a local tmp file

Important: Do NOT use gh run list - the history in the log file is more accurate for this specific test.

Local Test Running

To run tests locally for verification:

# Run specific test
yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

# With verbose logging
LOG_LEVEL=verbose yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

# With debug logging (very detailed)
LOG_LEVEL=debug yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

# With specific module logging
LOG_LEVEL='info; debug:sequencer,p2p' yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'

Log Structure

Timestamp Format

Logs use ISO timestamps: 2024-01-23T17:08:30.123Z - useful for correlating events across nodes.

Log Levels

ERROR - Failures, exceptions
WARN - Potential issues, recoverable problems
INFO - Key events, state transitions
VERBOSE - Detailed operational info
DEBUG - Fine-grained debugging (very noisy)

Component Prefixes

Log lines are prefixed with the component name (e.g., aztec:sequencer, aztec:p2p, aztec:archiver). These map to the Key Packages section in CLAUDE.md - use that as a reference for understanding what each component does.

Multi-Node Debugging

E2E tests often spawn multiple nodes. Key tips:

Identifying Nodes

Look for node identifiers in log prefixes: node-0, node-1, validator-0, etc.
Each node has its own log stream but they're interleaved in the combined output
Ask analyze-logs to filter by node when needed

Cross-Node Correlation

Use timestamps to correlate events across nodes
Look for message propagation: "Node A sends X" → "Node B receives X"
Check for missing events: if Node A sent but Node B never received, that's a clue

Common Multi-Node Issues

Network partition: Nodes can't reach each other
Clock skew: Timestamps don't align, causing validation failures
Split brain: Nodes have divergent views of state
Message ordering: Events arrive in unexpected order

Common Failure Patterns

Timeout Failures

Action executed, expected reaction didn't occur
Ask analyze-logs to find: what was the last action? What was expected?
Check for blocked operations, missing events, stuck processes

Assertion Failures

Test expectation not met
Compare actual vs expected values
Trace back to find where the wrong value originated

State Pollution

Previous test left system in bad state
Compare logs from the failing test with the same test in isolation
Check beforeEach/afterEach hooks

Race Conditions

Timing-dependent failures
Look for operations completing in different order
Check for missing synchronization

Example Session

User: Debug this flaky test: 343c52b17688d2cd

Claude: [Spawns identify-ci-failures subagent]
        Returns: epochs_l1_reorgs.test.ts failed with timeout after 383s
        Log downloaded to /tmp/343c52b17688d2cd.log
        History shows test started flaking recently

Claude: The test "handles reorgs spanning epoch boundaries" timed out. Looking at
        the history, this test has been flaking since yesterday. The failures correlate
        with PR #19783 "adding MAXIMUM_GOSSIP_CLOCK_DISPARITY for p2p validation".

        Let me check a passing run to compare. [Spawns analyze-logs for both]

User: What do you see as differences?

Claude: [Spawns analyze-logs with comparison prompt]
        Summary shows: In failed run, node-2 never received epoch proof notification.
        In passing run, all nodes received it within 2s of each other.

        Hypothesis: The new clock disparity check may be rejecting valid messages
        during epoch transitions when clocks drift slightly.

User: Can you check if there are any logs about clock validation?

Claude: [Spawns analyze-logs with specific prompt]
        Found: "Rejecting gossip message: clock disparity 3.2s exceeds maximum 2s"
        This confirms the hypothesis.

Claude: [Searches codebase for MAXIMUM_GOSSIP_CLOCK_DISPARITY]
        Found in p2p/config.ts, default is 2 seconds. The PR added this validation
        but epoch proofs can take longer to propagate. Recommend increasing the
        disparity threshold for epoch-related messages.

More by AztecProtocol

View all skills by AztecProtocol →

updating-changelog

AztecProtocol

Updates changelog documentation for contract developers and node operators by analyzing branch changes relative to 'next'. Use when preparing a PR, updating migration notes, documenting breaking changes, or when asked to update changelog/release notes.

worktree-spawn

AztecProtocol

Spawn an independent Claude instance in a git worktree to work on a task in parallel. Use when the user wants to delegate a task to run independently while continuing the current conversation.

noir-sync-update

AztecProtocol

Perform necessary follow-on updates as a result of updating the noir git submodule.

readme-writer

AztecProtocol

Guidelines for writing module READMEs that explain how a module works to developers who need to use it or understand its internals. Use when documenting a module, package, or subsystem.

adding-benchmarks

AztecProtocol

Add new benchmarks to the CI pipeline. Guides through creating benchmark JSON files, integrating with bootstrap.sh, and ensuring proper CI upload via ci3.yml workflow.

rebase-pr

AztecProtocol

Rebase a PR on its base branch, fix conflicts, and verify build

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6771,424

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2561,315

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5251,142

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,346805

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,260725

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,465674

Related MCP Servers

Browse all servers

Terminal Control

Manage remote systems and debug apps with Terminal Control—use tmux, run tmux commands, and list sessions for total cont

90 tools

agentic-debugger

Agentic Debugger: MCP server for interactive debugging with code instrumentation, empowering AI coding assistants to ins

17 tools

Chrome DevTools MCP

AI-driven control of live Chrome via Chrome DevTools: browser automation, debugging, performance analysis and network mo

28,1380 tools

Chrome DevTools

Use Chrome DevTools for web site test speed, debugging, and performance analysis. The essential chrome developer tools f

28,13326 tools

Blender

Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactiv

17,59521 tools

Desktop Commander

Desktop Commander MCP unifies code management with advanced source control, git, and svn support—streamlining developmen

5,63026 tools

Install

mkdir -p .claude/skills/debug-e2e && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4310" && unzip -o skill.zip -d .claude/skills/debug-e2e && rm skill.zip

Installs to .claude/skills/debug-e2e

Stats

Views

Installs

Author

AztecProtocol

7 skills published

Links

Source Code

debug-e2e

Install

About this skill

E2E Test Debugging

Invocation

When to Use

When NOT to Use

Key Principle

Workflow

Step 1: Identify Failures

Step 2: Discuss with User

Step 3: Deep Dive with analyze-logs

Step 4: Refine Hypothesis

Step 5: Investigate Codebase

Step 6: Suggest Fix or Local Test

Hypothesis Formation

Investigation Principles

History Investigation

Local Test Running

Log Structure

Timestamp Format

Log Levels

Component Prefixes

Multi-Node Debugging

Identifying Nodes

Cross-Node Correlation

Common Multi-Node Issues

Common Failure Patterns

Timeout Failures

Assertion Failures

State Pollution

Race Conditions

Example Session

More by AztecProtocol

updating-changelog

worktree-spawn

noir-sync-update

readme-writer

adding-benchmarks

rebase-pr

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers