debug-e2e
Interactive debugging for failed e2e tests. Orchestrates the debugging session but delegates log reading to subagents to keep the main conversation clean. Use for ping-pong debugging sessions where you want to form and test hypotheses together with the user.
Install
mkdir -p .claude/skills/debug-e2e && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4310" && unzip -o skill.zip -d .claude/skills/debug-e2e && rm skill.zipInstalls to .claude/skills/debug-e2e
About this skill
E2E Test Debugging
Interactive debugging for failed e2e tests. This skill orchestrates the debugging session but never reads logs directly - it delegates to subagents to keep the conversation context clean.
Invocation
The user can invoke this skill with:
- CI log hash:
/debug-e2e 343c52b17688d2cd - PR number:
/debug-e2e #19783or/debug-e2e 19783 - CI URL:
/debug-e2e http://ci.aztec-labs.com/... - Test name:
/debug-e2e epochs_l1_reorgs(for general investigation) - No argument:
/debug-e2ethen ask the user what they want to debug
When to Use
- Debugging flaky or failing e2e tests
- Investigating CI failures that need deep analysis
- When you want to collaborate with the user on forming hypotheses
- When comparing failed and successful runs
When NOT to Use
- Obvious assertion failures: If the test output clearly shows
expected 5, got 3, just investigate the code directly - Build/compilation errors: Use standard debugging, not log analysis
- Simple configuration issues: Missing env vars, wrong paths, etc.
- When user just wants a quick answer: This skill is for interactive ping-pong debugging sessions
Key Principle
Never read logs directly in this conversation. Logs can be 50k+ lines and would pollute the context. Instead:
- Use
identify-ci-failuressubagent to find failures and download logs - Use
analyze-logssubagent to deep-dive specific logs - Work with the summaries they return
Workflow
Step 1: Identify Failures
Spawn the identify-ci-failures subagent:
Use Task tool with subagent_type: "identify-ci-failures"
Prompt: "Identify CI failures for [PR number / CI URL / hash]"
This returns:
- List of failures with types
- Local file paths for downloaded logs (e.g.,
/tmp/<hash>.log) - History URL for finding successful runs
Step 2: Discuss with User
Present findings to the user:
- What tests failed?
- What type of failure (timeout, assertion, error)?
- Form initial hypotheses together
Step 3: Deep Dive with analyze-logs
Spawn the analyze-logs subagent with the local file path:
Use Task tool with subagent_type: "analyze-logs"
Prompt: "Analyze /tmp/<hash>.log focusing on test '<test_name>'. Look for [specific thing based on hypothesis]"
For comparison:
Prompt: "Compare /tmp/<failed>.log with /tmp/<success>.log for test '<test_name>'. Find divergence points."
Step 4: Refine Hypothesis
Based on the summary:
- Does the evidence support the hypothesis?
- What contradicts it?
- What new questions arise?
Discuss with user, then spawn another analyze-logs if needed.
Step 5: Investigate Codebase
Once you have a theory, search the codebase:
- Use Grep to find where specific log messages are generated
- Read the code context around log emission points
- Trace execution paths
Step 6: Suggest Fix or Local Test
Either:
- Propose a code fix based on findings
- Suggest running the test locally to verify:
yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
Hypothesis Formation
Take time to think deeply before proposing theories.
For each hypothesis:
- Clearly state the theory: "The test fails because X happens when Y"
- Identify expected evidence: "If this is correct, we should see log entries for Z"
- Ask analyze-logs to verify: Spawn subagent to look for specific evidence
- Look for contradictions: What would disprove this theory?
- Assign confidence: high / medium / low based on evidence
Formulate multiple competing hypotheses when the cause is unclear.
Investigation Principles
- Be systematic: Follow the workflow, don't jump to conclusions
- Be evidence-based: Every theory must be backed by log entries or code
- Be critical: Actively seek to disprove your own hypotheses
- Be thorough: Check timing, sequence, missing events, code context
- Be clear: Use specific timestamps and quotes from summaries
- Be practical: Suggest fixes that address root causes
History Investigation
To understand when a test started failing:
- Look for the
history:marker at the beginning of the log file (first few lines) - The history shows recent runs of this exact test with PASSED/FAILED/FLAKED status:
01-23 17:10:11: PASSED (2614d91ec48f4047): ... (Author: commit message (#PR)) 01-23 17:08:30: FLAKED (10d5f47f04025f1c): ... (code: 1) group:e2e-p2p-epoch-flakes (Author: commit message (#PR)) 01-23 16:51:21: FLAKED (512e978edff9e471): ... (code: 1) group:e2e-p2p-epoch-flakes (Author: commit message (#PR)) - Identify the transition point where test started failing/flaking
- Check the PR mentioned in the commit message to understand what changed
- Download logs from both passing and failing runs to compare:
- Use hash from history (e.g.,
2614d91ec48f4047for passed,10d5f47f04025f1cfor failed) yarn ci dlog <hash> > /tmp/<hash>.log 2>&1downloads the log to a local tmp file
- Use hash from history (e.g.,
Important: Do NOT use gh run list - the history in the log file is more accurate for this specific test.
Local Test Running
To run tests locally for verification:
# Run specific test
yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
# With verbose logging
LOG_LEVEL=verbose yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
# With debug logging (very detailed)
LOG_LEVEL=debug yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
# With specific module logging
LOG_LEVEL='info; debug:sequencer,p2p' yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
Log Structure
Timestamp Format
Logs use ISO timestamps: 2024-01-23T17:08:30.123Z - useful for correlating events across nodes.
Log Levels
ERROR- Failures, exceptionsWARN- Potential issues, recoverable problemsINFO- Key events, state transitionsVERBOSE- Detailed operational infoDEBUG- Fine-grained debugging (very noisy)
Component Prefixes
Log lines are prefixed with the component name (e.g., aztec:sequencer, aztec:p2p, aztec:archiver). These map to the Key Packages section in CLAUDE.md - use that as a reference for understanding what each component does.
Multi-Node Debugging
E2E tests often spawn multiple nodes. Key tips:
Identifying Nodes
- Look for node identifiers in log prefixes:
node-0,node-1,validator-0, etc. - Each node has its own log stream but they're interleaved in the combined output
- Ask
analyze-logsto filter by node when needed
Cross-Node Correlation
- Use timestamps to correlate events across nodes
- Look for message propagation: "Node A sends X" → "Node B receives X"
- Check for missing events: if Node A sent but Node B never received, that's a clue
Common Multi-Node Issues
- Network partition: Nodes can't reach each other
- Clock skew: Timestamps don't align, causing validation failures
- Split brain: Nodes have divergent views of state
- Message ordering: Events arrive in unexpected order
Common Failure Patterns
Timeout Failures
- Action executed, expected reaction didn't occur
- Ask analyze-logs to find: what was the last action? What was expected?
- Check for blocked operations, missing events, stuck processes
Assertion Failures
- Test expectation not met
- Compare actual vs expected values
- Trace back to find where the wrong value originated
State Pollution
- Previous test left system in bad state
- Compare logs from the failing test with the same test in isolation
- Check
beforeEach/afterEachhooks
Race Conditions
- Timing-dependent failures
- Look for operations completing in different order
- Check for missing synchronization
Example Session
User: Debug this flaky test: 343c52b17688d2cd
Claude: [Spawns identify-ci-failures subagent]
Returns: epochs_l1_reorgs.test.ts failed with timeout after 383s
Log downloaded to /tmp/343c52b17688d2cd.log
History shows test started flaking recently
Claude: The test "handles reorgs spanning epoch boundaries" timed out. Looking at
the history, this test has been flaking since yesterday. The failures correlate
with PR #19783 "adding MAXIMUM_GOSSIP_CLOCK_DISPARITY for p2p validation".
Let me check a passing run to compare. [Spawns analyze-logs for both]
User: What do you see as differences?
Claude: [Spawns analyze-logs with comparison prompt]
Summary shows: In failed run, node-2 never received epoch proof notification.
In passing run, all nodes received it within 2s of each other.
Hypothesis: The new clock disparity check may be rejecting valid messages
during epoch transitions when clocks drift slightly.
User: Can you check if there are any logs about clock validation?
Claude: [Spawns analyze-logs with specific prompt]
Found: "Rejecting gossip message: clock disparity 3.2s exceeds maximum 2s"
This confirms the hypothesis.
Claude: [Searches codebase for MAXIMUM_GOSSIP_CLOCK_DISPARITY]
Found in p2p/config.ts, default is 2 seconds. The PR added this validation
but epoch proofs can take longer to propagate. Recommend increasing the
disparity threshold for epoch-related messages.
More by AztecProtocol
View all skills by AztecProtocol →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversManage remote systems and debug apps with Terminal Control—use tmux, run tmux commands, and list sessions for total cont
Agentic Debugger: MCP server for interactive debugging with code instrumentation, empowering AI coding assistants to ins
AI-driven control of live Chrome via Chrome DevTools: browser automation, debugging, performance analysis and network mo
Use Chrome DevTools for web site test speed, debugging, and performance analysis. The essential chrome developer tools f
Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactiv
Desktop Commander MCP unifies code management with advanced source control, git, and svn support—streamlining developmen
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.