evals-context
Provides context about the Roo Code evals system structure in this monorepo. Use when tasks mention "evals", "evaluation", "eval runs", "eval exercises", or working with the evals infrastructure. Helps distinguish between the evals execution system (packages/evals, apps/web-evals) and the public website evals display page (apps/web-roo-code/src/app/evals).
Install
mkdir -p .claude/skills/evals-context && curl -L -o skill.zip "https://mcp.directory/api/skills/download/2616" && unzip -o skill.zip -d .claude/skills/evals-context && rm skill.zipInstalls to .claude/skills/evals-context
About this skill
Evals Codebase Context
When to Use This Skill
Use this skill when the task involves:
- Modifying or debugging the evals execution infrastructure
- Adding new eval exercises or languages
- Working with the evals web interface (apps/web-evals)
- Modifying the public evals display page on roocode.com
- Understanding where evals code lives in this monorepo
When NOT to Use This Skill
Do NOT use this skill when:
- Working on unrelated parts of the codebase (extension, webview-ui, etc.)
- The task is purely about the VS Code extension's core functionality
- Working on the main website pages that don't involve evals
Key Disambiguation: Two "Evals" Locations
This monorepo has two distinct evals-related locations that can cause confusion:
| Component | Path | Purpose |
|---|---|---|
| Evals Execution System | packages/evals/ | Core eval infrastructure: CLI, DB schema, Docker configs |
| Evals Management UI | apps/web-evals/ | Next.js app for creating/monitoring eval runs (localhost:3446) |
| Website Evals Page | apps/web-roo-code/src/app/evals/ | Public roocode.com page displaying eval results |
| External Exercises Repo | Roo-Code-Evals | Actual coding exercises (NOT in this monorepo) |
Directory Structure Reference
packages/evals/ - Core Evals Package
packages/evals/
├── ARCHITECTURE.md # Detailed architecture documentation
├── ADDING-EVALS.md # Guide for adding new exercises/languages
├── README.md # Setup and running instructions
├── docker-compose.yml # Container orchestration
├── Dockerfile.runner # Runner container definition
├── Dockerfile.web # Web app container
├── drizzle.config.ts # Database ORM config
├── src/
│ ├── index.ts # Package exports
│ ├── cli/ # CLI commands for running evals
│ │ ├── runEvals.ts # Orchestrates complete eval runs
│ │ ├── runTask.ts # Executes individual tasks in containers
│ │ ├── runUnitTest.ts # Validates task completion via tests
│ │ └── redis.ts # Redis pub/sub integration
│ ├── db/
│ │ ├── schema.ts # Database schema (runs, tasks)
│ │ ├── queries/ # Database query functions
│ │ └── migrations/ # SQL migrations
│ └── exercises/
│ └── index.ts # Exercise loading utilities
└── scripts/
└── setup.sh # Local macOS setup script
apps/web-evals/ - Evals Management Web App
apps/web-evals/
├── src/
│ ├── app/
│ │ ├── page.tsx # Home page (runs list)
│ │ ├── runs/
│ │ │ ├── new/ # Create new eval run
│ │ │ └── [id]/ # View specific run status
│ │ └── api/runs/ # SSE streaming endpoint
│ ├── actions/ # Server actions
│ │ ├── runs.ts # Run CRUD operations
│ │ ├── tasks.ts # Task queries
│ │ ├── exercises.ts # Exercise listing
│ │ └── heartbeat.ts # Controller health checks
│ ├── hooks/ # React hooks (SSE, models, etc.)
│ └── lib/ # Utilities and schemas
apps/web-roo-code/src/app/evals/ - Public Website Evals Page
apps/web-roo-code/src/app/evals/
├── page.tsx # Fetches and displays public eval results
├── evals.tsx # Main evals display component
├── plot.tsx # Visualization component
└── types.ts # EvalRun type (extends packages/evals types)
This page displays eval results on the public roocode.com website. It imports types from @roo-code/evals but does NOT run evals.
Architecture Overview
The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments:
┌─────────────────────────────────────────────────────────────┐
│ Web App (apps/web-evals) ──────────────────────────────── │
│ │ │
│ ▼ │
│ PostgreSQL ◄────► Controller Container │
│ │ │ │
│ ▼ ▼ │
│ Redis ◄───► Runner Containers (1-25 parallel) │
└─────────────────────────────────────────────────────────────┘
Key components:
- Controller: Orchestrates eval runs, spawns runners, manages task queue (p-queue)
- Runner: Isolated Docker container with VS Code + Roo Code extension + language runtimes
- Redis: Pub/sub for real-time events (NOT task queuing)
- PostgreSQL: Stores runs, tasks, metrics
Common Tasks Quick Reference
Adding a New Eval Exercise
- Add exercise to Roo-Code-Evals repo (external)
- See
packages/evals/ADDING-EVALS.mdfor structure
Modifying Eval CLI Behavior
Edit files in packages/evals/src/cli/:
runEvals.ts- Run orchestrationrunTask.ts- Task executionrunUnitTest.ts- Test validation
Modifying the Evals Web Interface
Edit files in apps/web-evals/src/:
app/runs/new/new-run.tsx- New run formactions/runs.ts- Run server actions
Modifying the Public Evals Display Page
Edit files in apps/web-roo-code/src/app/evals/:
Database Schema Changes
- Edit
packages/evals/src/db/schema.ts - Generate migration:
cd packages/evals && pnpm drizzle-kit generate - Apply migration:
pnpm drizzle-kit migrate
Running Evals Locally
# From repo root
pnpm evals
# Opens web UI at http://localhost:3446
Ports (defaults):
- PostgreSQL: 5433
- Redis: 6380
- Web: 3446
Testing
# packages/evals tests
cd packages/evals && npx vitest run
# apps/web-evals tests
cd apps/web-evals && npx vitest run
Key Types/Exports from @roo-code/evals
The package exports are defined in packages/evals/src/index.ts:
- Database queries:
getRuns,getTasks,getTaskMetrics, etc. - Schema types:
Run,Task,TaskMetrics - Used by both
apps/web-evalsandapps/web-roo-code
More by RooCodeInc
View all →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.