adding-benchmarks

3views

1installs

Add new benchmarks to the CI pipeline. Guides through creating benchmark JSON files, integrating with bootstrap.sh, and ensuring proper CI upload via ci3.yml workflow.

Install

mkdir -p .claude/skills/adding-benchmarks && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5688" && unzip -o skill.zip -d .claude/skills/adding-benchmarks && rm skill.zip

Installs to .claude/skills/adding-benchmarks

About this skill

Adding Benchmarks

When to Use

Use this skill when:

Adding new performance benchmarks to a package
Creating benchmark tests that should be tracked over time
Integrating existing benchmarks into the CI pipeline

Benchmark System Overview

Benchmarks flow through the system as follows:

Generation: Each package produces bench-out/*.bench.json files
Aggregation: bench_merge in root bootstrap.sh combines all files, prefixing names with the package path
Upload: CI caches the merged JSON and GitHub Action uploads to the benchmark dashboard
Display: Results appear at the dashboard with historical tracking

Live dashboard: https://aztecprotocol.github.io/benchmark-page-data/bench/?branch=next

How Benchmark Names Work

Name Construction

The final benchmark name combines two parts:

Package prefix (added automatically by bench_merge): Based on where the file lives
Local name (what you write in JSON): Your metric identifier

Dashboard Grouping

The dashboard splits names by / to create a collapsible tree. The last segment becomes the chart name, everything before it becomes the group hierarchy.

Full Name	Group Path	Chart Name
`yarn-project/stdlib/Tx/private/getTxHash/avg`	`yarn-project/stdlib/Tx/private/getTxHash`	`avg`
`yarn-project/kv-store/Map/Individual insertion`	`yarn-project/kv-store/Map`	`Individual insertion`
`barretenberg/sol/Add2HonkVerifier`	`barretenberg/sol`	`Add2HonkVerifier`

Naming Best Practices

Use / to create logical groupings:

[
  {"name": "Tx/private/getTxHash/avg", "value": 1.2, "unit": "ms"},
  {"name": "Tx/private/getTxHash/p50", "value": 1.1, "unit": "ms"},
  {"name": "Tx/public/getTxHash/avg", "value": 2.3, "unit": "ms"}
]

Avoid flat names - they create no hierarchy and are hard to navigate:

[
  {"name": "tx_private_gettxhash_avg", "value": 1.2, "unit": "ms"}
]

Common suffixes:

Timing: avg, p50, p95, p99, min, max, total
Size: _opcodes, _gates, memory
Rate: gasPerSecond, jobs_per_sec

Required JSON Format

All benchmark files must be arrays using the customSmallerIsBetter format:

[
  {"name": "category/metric_name", "value": 12345, "unit": "gas"},
  {"name": "category/another_metric", "value": 100.5, "unit": "ms"}
]

Rules:

Must be a JSON array [...], not an object
Each entry needs name, value, unit
value must be numeric (lower is better)
File must end with .bench.json

Optional fields (preserved by benchmark-action):

range (string): Variance info (e.g., "± 5%")
extra (string): Metadata — used for stacked chart grouping (see below)

Stacked Charts

To render multiple metrics as a single stacked area chart (e.g., component breakdowns), add an extra field with a stacked:GROUP_NAME value. Entries sharing the same GROUP_NAME are overlaid on one chart.

[
  {"name": "proving/cpus-8/total_ms", "value": 31663, "unit": "ms"},
  {"name": "proving/cpus-8/oink_prove_ms", "value": 4992, "unit": "ms", "extra": "stacked:proving/cpus-8/components"},
  {"name": "proving/cpus-8/sumcheck_ms", "value": 3318, "unit": "ms", "extra": "stacked:proving/cpus-8/components"},
  {"name": "proving/cpus-8/circuit_ms", "value": 4642, "unit": "ms", "extra": "stacked:proving/cpus-8/components"}
]

How it works:

extra: "stacked:GROUP_NAME" → entries with the same GROUP_NAME are rendered as one stacked chart
No extra field → individual line chart (default behavior)
Stacked entries still appear as individual charts on the main benchmark-action dashboard; the stacked view is rendered by a custom dashboard page
The GROUP_NAME becomes the chart title (after bench_merge prefixing, same as name)
The extra field is one of the 5 fields preserved by the benchmark-action Zod schema (name, value, unit, range, extra); any other custom fields will be stripped

When to use stacked charts:

Component-level timing breakdowns (e.g., sumcheck, PCS, circuit construction)
Resource allocation views (e.g., memory by subsystem)
Any case where you want to see how a total decomposes into parts over time

Adding a New Benchmark

Step 1: Create the Benchmark

TypeScript (most common):

// my_bench.test.ts
import { Timer } from '@aztec/foundation/timer';
import { writeFile, mkdir } from 'fs/promises';

describe('MyComponent benchmarks', () => {
  const results: { name: string; value: number; unit: string }[] = [];

  afterAll(async () => {
    if (process.env.BENCH_OUTPUT) {
      await mkdir(path.dirname(process.env.BENCH_OUTPUT), { recursive: true });
      await writeFile(process.env.BENCH_OUTPUT, JSON.stringify(results));
    }
  });

  it('benchmark operation', async () => {
    const timer = new Timer();
    // ... operation to benchmark ...
    results.push({ name: 'MyComponent/operation/avg', value: timer.ms(), unit: 'ms' });
  });
});

Shell (jq-based):

mkdir -p bench-out
jq -n '[
  {name: "metric1", value: '$VALUE1', unit: "ms"},
  {name: "metric2", value: '$VALUE2', unit: "gas"}
]' > bench-out/my-component.bench.json

Python:

import json
benchmark_list = [{"name": "category/metric", "value": 12345, "unit": "gas"}]
with open("bench-out/my-component.bench.json", "w") as f:
    json.dump(benchmark_list, f)

Step 2: Register in bootstrap.sh

Add to the package's bench_cmds function:

function bench_cmds {
  local hash=$(hash)
  echo "$hash BENCH_OUTPUT=bench-out/my_component.bench.json yarn-project/scripts/run_test.sh <package>/src/my_bench.test.ts"
}

Options: :ISOLATE=1, :CPUS=8, :MEM=16g, :TIMEOUT=7200

CPUS Suggestion: For long running or compute-heavy benchmarks allocate CPUs (:CPUS=N). Benchmarks have strict scheduling, so if you request X CPUs, you'll have them available for consistent results.

ISOLATE Suggestion: Use :ISOLATE=1 when your benchmark needs a clean, isolated environment with no network access and pinned resources. This runs the test in a Docker container, ensuring reproducible results without interference from other processes.

MEM Suggestion: Use :MEM=Xg (e.g., :MEM=16g) for memory-intensive benchmarks that may exceed the default allocation (CPUS × 4GB). Pair with :ISOLATE=1 since memory limits are enforced via Docker.

TIMEOUT Suggestion: Use :TIMEOUT=N (in seconds) for benchmarks that take longer than the default timeout. For example, :TIMEOUT=1800 for 30 minutes, :TIMEOUT=7200 for 2 hours.

Important naming gotcha: Benchmark test files must use .bench.test.ts (with a dot before bench), NOT _bench.test.ts. The test discovery pattern [[ "$test" =~ \.bench\.test\.ts$ ]] specifically looks for .bench.test.ts.

Step 3: Verify

# Run locally
BENCH_OUTPUT=bench-out/test.bench.json yarn test src/my_bench.test.ts

# Validate JSON
jq . bench-out/test.bench.json
jq 'all(has("name") and has("value") and has("unit"))' bench-out/test.bench.json

CI Details

Benchmarks upload when:

PR has label: ci-merge-queue, ci-full, or ci-full-no-test-cache (publishes to target branch, i.e. next or a merge-train branch)
Running on merge queue (publishes with next)

10-commit visibility window: The dashboard only shows benchmarks that ran in the last 10 commits. If a benchmark stops running, it disappears after ~10 merges.

Reference Implementations

TypeScript: yarn-project/stdlib/src/tx/tx_bench.test.ts
Python: l1-contracts/scripts/generate_benchmark_json.py
Shell: yarn-project/p2p/testbench/consolidate_benchmarks.sh
Circuits: noir-projects/noir-protocol-circuits/scripts/run_bench.sh

More by AztecProtocol

View all skills by AztecProtocol →

updating-changelog

AztecProtocol

Updates changelog documentation for contract developers and node operators by analyzing branch changes relative to 'next'. Use when preparing a PR, updating migration notes, documenting breaking changes, or when asked to update changelog/release notes.

worktree-spawn

AztecProtocol

Spawn an independent Claude instance in a git worktree to work on a task in parallel. Use when the user wants to delegate a task to run independently while continuing the current conversation.

noir-sync-update

AztecProtocol

Perform necessary follow-on updates as a result of updating the noir git submodule.

readme-writer

AztecProtocol

Guidelines for writing module READMEs that explain how a module works to developers who need to use it or understand its internals. Use when documenting a module, package, or subsystem.

rebase-pr

AztecProtocol

Rebase a PR on its base branch, fix conflicts, and verify build

fix-pr

AztecProtocol

Fix a failing PR by analyzing CI logs and fixing errors. Autonomous workflow that identifies failures, rebases, fixes issues, and pushes.

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6881,430

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2721,337

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5471,153

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,359809

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,269732

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,498687

Related MCP Servers

Browse all servers

Voice MCP

Voice MCP powers two-way voice apps with Google Cloud Speech to Text, Speech Recognition, and Text to Speech API for acc

8750 tools

Specs Workflow

Streamline project docs with Specs Workflow: automate software project plan templates, tracking, and OpenAPI-driven prog

1281 tools

Dart Project Management

Effortlessly manage tasks and documents with Dart Project Management. Streamline your workflow by creating, updating, an

1260 tools

DebuggAI

DebuggAI enables zero-config end to end testing for web applications, offering secure tunnels, easy setup, and detailed

910 tools

Keboola MCP Server

Keboola MCP Server connects AI agents and MCP clients to the Keboola data platform for natural language SQL, Keboola int

820 tools

Sanity CMS

Sanity CMS offers powerful content management software for creating, querying, and managing documents, datasets, schemas

730 tools

Install

mkdir -p .claude/skills/adding-benchmarks && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5688" && unzip -o skill.zip -d .claude/skills/adding-benchmarks && rm skill.zip

Installs to .claude/skills/adding-benchmarks

Stats

Views

Installs

Author

AztecProtocol

7 skills published

Links

Source Code

adding-benchmarks

Install

About this skill

Adding Benchmarks

When to Use

Benchmark System Overview

How Benchmark Names Work

Name Construction

Dashboard Grouping

Naming Best Practices

Required JSON Format

Stacked Charts

Adding a New Benchmark

Step 1: Create the Benchmark

Step 2: Register in bootstrap.sh

Step 3: Verify

CI Details

Reference Implementations

More by AztecProtocol

updating-changelog

worktree-spawn

noir-sync-update

readme-writer

rebase-pr

fix-pr

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers