doc-testing
Comprehensive guide for writing tests in magenta.nvim, including test environment setup, mock providers, driver interactions, and best practices
Install
mkdir -p .claude/skills/doc-testing && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3175" && unzip -o skill.zip -d .claude/skills/doc-testing && rm skill.zipInstalls to .claude/skills/doc-testing
About this skill
Testing in magenta.nvim
To run the full test suite, use npx vitest run from the project root. You do not need to cd.
To run a specific test file, use npx vitest run <file>. Important You do not need to cd.
Test files should use the .test.ts extension (e.g., myFeature.test.ts).
Tests should make use of the node/test/preamble.ts helpers.
When doing integration-level testing, like user flows, use the withDriver helper and the interactions in node/test/driver.ts. When performing generic user actions that may be reusable between tests, put them into the NvimDriver class as helpers.
As of July 2025, tests are now run in parallel for improved performance. The test infrastructure has been updated to support concurrent test execution.
Test Environment Setup
Fixture Files & Directory Structure:
- Each test gets a fresh temporary directory in
/tmp/magenta-test/{testId}/ - Files from
node/test/fixtures/are copied into this temp directory for each test - Available fixture files include
poem.txt,test.jpg,sample2.pdf,test.bin, and others - Nvim runs in this temporary directory, so files can be safely mutated during tests
- The temp directory is automatically cleaned up after each test - no manual cleanup needed
- Use
await getcwd(driver.nvim)to get the current working directory for file path operations - The temporary directory is completely isolated between tests
Test Pattern:
import { withDriver } from "../test/preamble";
test("my test", async () => {
await withDriver({}, async (driver) => {
// Test code here - nvim runs in temp dir with fixture files
// Access cwd with: const cwd = await getcwd(driver.nvim)
});
});
Custom File Setup:
test("test with custom files", async () => {
await withDriver(
{
setupFiles: async (tmpDir) => {
const fs = await import("fs/promises");
const path = await import("path");
await fs.writeFile(path.join(tmpDir, "custom.txt"), "content");
await fs.mkdir(path.join(tmpDir, "subfolder"));
},
},
async (driver) => {
// Custom files are now available in the test environment
},
);
});
Directory Structure:
The test environment creates an isolated directory structure:
baseDir:/tmp/magenta-test/{testId}/- root of all test directoriestmpDir:{baseDir}/cwd/- the working directory where nvim runs (fixtures copied here)homeDir:{baseDir}/home/- simulated home directory ($HOMEis set to this)
The withDriver callback receives a dirs object with all three paths:
await withDriver({}, async (driver, dirs) => {
console.log(dirs.tmpDir); // /tmp/magenta-test/abc123/cwd
console.log(dirs.homeDir); // /tmp/magenta-test/abc123/home
console.log(dirs.baseDir); // /tmp/magenta-test/abc123
});
Setting Up Home Directory Files:
Use setupHome to create files in the simulated home directory. This is useful for testing features that read from ~/.magenta/ or other home directory paths:
test("test with home directory config", async () => {
await withDriver(
{
setupHome: async (homeDir) => {
const fs = await import("fs/promises");
const path = await import("path");
// Create ~/.magenta/options.json
const magentaDir = path.join(homeDir, ".magenta");
await fs.mkdir(magentaDir, { recursive: true });
await fs.writeFile(
path.join(magentaDir, "options.json"),
JSON.stringify({
filePermissions: [{ path: "~/Documents", read: true }],
}),
);
},
},
async (driver) => {
// Magenta will load options from the simulated ~/.magenta/options.json
},
);
});
Setting Up Directories Outside CWD:
Use setupExtraDirs to create directories outside the working directory. This is useful for testing file permission boundaries:
test("test with external directories", async () => {
let outsidePath: string;
await withDriver(
{
setupExtraDirs: async (baseDir) => {
const fs = await import("fs/promises");
const path = await import("path");
// Create a directory outside cwd
outsidePath = path.join(baseDir, "outside");
await fs.mkdir(outsidePath, { recursive: true });
await fs.writeFile(path.join(outsidePath, "secret.txt"), "secret");
},
},
async (driver, dirs) => {
// outsidePath is outside dirs.tmpDir, so file access should be restricted
// unless explicitly permitted via filePermissions
},
);
});
Combined Setup for Permission Testing:
A common pattern for testing file permissions is to use both setupExtraDirs and setupHome together:
test("can access external dir with filePermissions", async () => {
let outsidePath: string;
await withDriver(
{
setupExtraDirs: async (baseDir) => {
const fs = await import("fs/promises");
const path = await import("path");
outsidePath = path.join(baseDir, "outside");
await fs.mkdir(outsidePath, { recursive: true });
await fs.writeFile(path.join(outsidePath, "allowed.txt"), "content");
// Write options.json here since we now have the path
const homeDir = path.join(baseDir, "home");
const magentaDir = path.join(homeDir, ".magenta");
await fs.mkdir(magentaDir, { recursive: true });
await fs.writeFile(
path.join(magentaDir, "options.json"),
JSON.stringify({
filePermissions: [{ path: outsidePath, read: true }],
}),
);
},
},
async (driver) => {
// Tools can now access outsidePath due to filePermissions
},
);
});
Available Mocks & Test Interactions
Configuring Magenta Options:
Tests can override magenta options by passing them to withDriver:
test("test with custom options", async () => {
await withDriver(
{
options: {
getFileAutoAllowGlobs: ["*.log", "config/*"],
changeDebounceMs: 100,
// Any other MagentaOptions can be overridden here
},
},
async (driver) => {
// Magenta will use the custom options
},
);
});
Available options include:
getFileAutoAllowGlobs- Array of glob patterns for auto-allowing file readschangeDebounceMs- Override the default change tracking debounce- Any other options from
MagentaOptionstype
Mock Provider Interactions:
The mock provider (driver.mockAnthropic) uses MockStream objects that mirror Anthropic's streaming API. Streams contain Anthropic-formatted messages (Anthropic.MessageParam[]), not our internal ProviderMessage[] format.
Required Type Imports for Tests:
import type Anthropic from "@anthropic-ai/sdk";
type ToolResultBlockParam = Anthropic.Messages.ToolResultBlockParam;
type ContentBlockParam = Anthropic.Messages.ContentBlockParam;
type TextBlockParam = Anthropic.Messages.TextBlockParam;
type DocumentBlockParam = Anthropic.Messages.DocumentBlockParam;
Awaiting Streams:
// Wait for any pending stream
const stream = await driver.mockAnthropic.awaitPendingStream();
// Wait for stream with specific text in message content
const stream =
await driver.mockAnthropic.awaitPendingStreamWithText("specific text");
// Wait for user message (tool results, etc.)
const stream = await driver.mockAnthropic.awaitPendingUserRequest();
// Wait for forced tool use requests
const forceRequest =
await driver.mockAnthropic.awaitPendingForceToolUseRequest();
// Check if there's a pending stream with specific text (non-blocking)
const hasPending = driver.mockAnthropic.hasPendingStreamWithText("text");
Responding to Streams:
// Simple text response
stream.respond({
stopReason: "end_turn",
text: "Response text",
toolRequests: [],
});
// Response with tool use
stream.respond({
stopReason: "tool_use",
text: "I'll use a tool",
toolRequests: [
{
status: "ok",
value: {
id: "tool_id" as ToolRequestId,
toolName: "get_file" as ToolName,
input: { filePath: "./file.txt" as UnresolvedFilePath },
},
},
],
});
// Response with error tool request
stream.respond({
stopReason: "tool_use",
text: "Tool failed",
toolRequests: [
{
status: "error",
rawRequest: { invalid: "request" },
},
],
});
Responding to Force Tool Use Requests:
const forceRequest =
await driver.mockAnthropic.awaitPendingForceToolUseRequest();
// Successful tool response
await driver.mockAnthropic.respondToForceToolUse({
toolRequest: {
status: "ok",
value: {
id: "tool_id" as ToolRequestId,
toolName: "get_file" as ToolName,
input: { filePath: "./file.txt" as UnresolvedFilePath },
},
},
stopReason: "tool_use",
});
// Error tool response
await driver.mockAnthropic.respondToForceToolUse({
toolRequest: {
status: "error",
rawRequest: { invalid: "data" },
},
stopReason: "tool_use",
});
Stream Inspection:
// Access stream properties (Anthropic format)
console.log(stream.messages); // Anthropic.MessageParam[] - raw Anthropic format
console.log(stream.getProviderMessages()); // ProviderMessage[] - converted format
console.log(stream.systemPrompt); // System prompt (if any)
// For force tool use requests
console.log(forceRequest.spec); // Tool specification
console.log(forceRequest.model); // Model used
console.log(forceRequest.messages); // Message history
// Check if stream was aborted
if (stream.aborted) {
// Handle aborted stream
}
Advanced Response Patterns:
// Stream individual parts of response
stream.streamText("First part of response");
stream.streamToolUse(toolId, toolName, input);
stream.streamThinking("Thinking content", "signature");
stream.finishResponse("end_turn");
// Respond with errors
stream.respondWithError(new Error("Something went wrong"));
Mock Provider:
driver.mockAnthropic- Pre-configured mock provider that captures all streamsawait driver.mockAnthropic.awaitPendingStream()- Wait for regular message streamsawait driver.mockAnthropic.awaitPendingStreamWithText("text")- Wait for stream containing specific textawait driver.mockAnthropic.awaitPendingForceToolUseRequest()- Wait for forced tool use requestsawait driver.mockAnthropic.respondToForceToolUse({...})- Send mock responses- No need to manually mock providers - they're already set up in the test infrastructure
Tool Result Content Structure (Important!):
Anthropic's ToolResultBlockParam has a different structure than our internal ProviderToolResult:
// Our internal format (ProviderToolResult):
{
type: "tool_result",
id: ToolRequestId,
result: {
status: "ok" | "error",
value: ProviderToolResultContent[], // nested here
error?: string,
}
}
// Anthropic format (ToolResultBlockParam) - what you see in stream.messages:
{
type: "tool_result",
tool_use_id: string, // different field name!
content: string | ContentBlockParam[], // different field name!
is_error?: boolean, // different error indicator!
}
Document Blocks are Siblings, Not Nested:
When documents are sent to Anthropic, they appear as sibling blocks in the user message, not nested inside tool_result.content:
// User message content array:
[
{ type: "tool_result", tool_use_id: "...", content: [], is_error: false },
{ type: "document", source: {...}, title: "..." } // <-- sibling, not nested!
]
Finding Tool Results in Stream Messages:
const stream = await driver.mockAnthropic.awaitPendingStream();
// Find user message containing the tool result
let userMessageContent: ContentBlockParam[] | undefined;
for (const msg of stream.messages) {
if (msg.role === "user" && Array.isArray(msg.content)) {
const content = msg.content as ContentBlockParam[];
const hasToolResult = content.some(
(block: ContentBlockParam) => block.type === "tool_result",
);
if (hasToolResult) userMessageContent = content;
}
}
// Get the tool result block
const toolResult = userMessageContent!.find(
(block: ContentBlockParam) => block.type === "tool_result",
) as ToolResultBlockParam;
// Check for errors
expect(toolResult.is_error).toBeFalsy();
// Access content (note: might be string or array)
if (Array.isArray(toolResult.content)) {
const textContent = toolResult.content.find(
(item: ContentBlockParam) => item.type === "text",
) as TextBlockParam;
}
Checking Error Results:
// Anthropic format for errors:
expect(toolResult.is_error).toBe(true);
const errorContent =
typeof toolResult.content === "string"
? toolResult.content
: JSON.stringify(toolResult.content);
expect(errorContent).toContain("expected error message");
Driver Interactions (prefer these over internal API access):
await driver.editFile("poem.txt")- Open fixture filesawait driver.command("normal! gg")- Execute vim commandsawait driver.magenta.command("predict-edit")- Execute magenta commands- Use real nvim interactions to trigger change tracking naturally
Testing Best Practices:
- DO: Use realistic nvim interactions (
driver.editFile(),driver.command()) - DON'T: Reach into internal APIs (
driver.magenta.changeTracker.onTextDocumentDidChange()) - DO: Let the system work naturally - make real edits and let change tracking happen
- DO: Write integration tests that exercise the full user flow
- DON'T: Mock internal components - use the provided driver and mock provider
Change Tracker Testing:
- DO: Use
driver.assertChangeTrackerHasEdits(count)anddriver.assertChangeTrackerContains(changes)instead of arbitrary timeouts - DO: Be aware that rapid edits may be batched into single changes by the tracker
- DO: Use explicit assertions about what changes should be tracked rather than waiting fixed amounts of time
- DON'T: Use
setTimeout()or fixed delays when waiting for change tracking - use the assertion methods instead
Mock Stream Objects:
Streams captured by awaitPendingStream() contain:
stream.messages- Anthropic.MessageParam[] (raw Anthropic format)stream.getProviderMessages()- ProviderMessage[] (converted format for easier assertions)stream.systemPrompt- The system prompt used (if any)stream.aborted- Whether the stream was abortedstream.resolved- Whether the stream has finished
Force tool use requests captured by awaitPendingForceToolUseRequest() contain:
request.spec- The tool specification usedrequest.model- Which model was requestedrequest.messages- The messages array containing user/assistant conversationrequest.systemPrompt- The system prompt used (if any)request.defer- Promise resolution control
Type Narrowing with expect():
expect() assertions don't narrow TypeScript's discriminated unions. Add explicit guards:
expect(documentContent.source.type).toBe("base64");
// This doesn't narrow the type, so add:
if (documentContent.source.type !== "base64")
throw new Error("Expected base64 source");
// Now TypeScript knows source has media_type and data
expect(documentContent.source.media_type).toBe("application/pdf");
System Reminders in Mock Streams:
System reminders are an internal ProviderMessage type (system_reminder) that get converted to plain text blocks with <system-reminder> tags when sent to Anthropic:
// In tests checking mock stream messages, search for text blocks containing the tag:
function findSystemReminderText(
content: string | ContentBlockParam[],
): TextBlockParam | undefined {
if (typeof content === "string") return undefined;
return content.find(
(c): c is TextBlockParam =>
c.type === "text" && c.text.includes("<system-reminder>"),
);
}
System Prompt vs User Messages: When implementing AI features, maintain proper separation:
- System prompt: General instructions about the agent's role and behavior ("You have to do your best to predict...")
- User messages: Specific contextual data (buffer content, cursor position, recent changes) This separation keeps the system prompt focused on behavior while allowing dynamic context in messages.
Test Writing Best Practices
Avoid Conditional Expect Statements
DON'T write tests with conditional expects like this:
if (toolResult && toolResult.type === "tool_result") {
expect(toolResult.result.status).toBe("ok");
if (toolResult.result.status === "ok") {
const textContent = toolResult.result.value.find(
(item) => item.type === "text",
);
if (textContent && textContent.type === "text") {
expect(textContent.text).toContain("expected content");
}
}
}
DO use TypeScript type assertions and direct expects:
const toolResult = toolResultMessage.content[0] as Extract<
(typeof toolResultMessage.content)[0],
{ type: "tool_result" }
>;
expect(toolResult.type).toBe("tool_result");
expect(toolResult.result.status).toBe("ok");
const result = toolResult.result as Extract<
typeof toolResult.result,
{ status: "ok" }
>;
const textContent = result.value.find(
(item) => item.type === "text",
) as Extract<(typeof result.value)[0], { type: "text" }>;
expect(textContent).toBeDefined();
expect(textContent.text).toContain("expected content");
TypeScript Type Narrowing in Tests
Use TypeScript's Extract utility type to narrow union types safely:
// For narrowing message content
const toolResult = content[0] as Extract<
(typeof content)[0],
{ type: "tool_result" }
>;
// For narrowing result status
const okResult = toolResult.result as Extract<
typeof toolResult.result,
{ status: "ok" }
>;
const errorResult = toolResult.result as Extract<
typeof toolResult.result,
{ status: "error" }
>;
Test Structure Patterns
Basic Test Structure
it("should do something", async () => {
await withDriver({}, async (driver) => {
await driver.showSidebar();
// Trigger the action
await driver.inputMagentaText(`Some command`);
await driver.send();
// Mock the response
const request = await driver.mockAnthropic.awaitPendingRequest();
request.respond({
stopReason: "tool_use",
text: "response text",
toolRequests: [
/* tool requests */
],
});
// Assert the UI state
await driver.assertDisplayBufferContains("Expected UI text");
// Handle tool result and verify
const toolResultRequest = await driver.mockAnthropic.awaitPendingRequest();
const toolResultMessage =
toolResultRequest.messages[toolResultRequest.messages.length - 1];
// Type-safe assertions
expect(toolResultMessage.role).toBe("user");
expect(Array.isArray(toolResultMessage.content)).toBe(true);
const toolResult = toolResultMessage.content[0] as Extract<
(typeof toolResultMessage.content)[0],
{ type: "tool_result" }
>;
expect(toolResult.type).toBe("tool_result");
expect(toolResult.result.status).toBe("ok");
});
});
Tests with File Setup
it("should handle custom files", async () => {
await withDriver(
{
setupFiles: async (tmpDir) => {
const fs = await import("fs/promises");
const path = await import("path");
await fs.writeFile(path.join(tmpDir, "test.txt"), "content");
},
},
async (driver) => {
// Test implementation
},
);
});
Tests with Custom Options
it("should respect configuration", async () => {
await withDriver(
{
options: {
someOption: ["value1", "value2"],
},
},
async (driver) => {
// Test implementation
},
);
});
Mock Provider Patterns
Awaiting Streams
// Wait for regular streams
const stream = await driver.mockAnthropic.awaitPendingStream();
// Wait for forced tool use requests
const forceRequest =
await driver.mockAnthropic.awaitPendingForceToolUseRequest();
Responding to Streams
// Simple response
stream.respond({
stopReason: "end_turn",
text: "Response text",
toolRequests: [],
});
// Response with tool use
stream.respond({
stopReason: "tool_use",
text: "I'll use a tool",
toolRequests: [
{
status: "ok",
value: {
id: "tool_id" as ToolRequestId,
toolName: "tool_name" as ToolName,
input: { param: "value" },
},
},
],
});
Common Assertion Patterns
UI Assertions
// Check for presence
await driver.assertDisplayBufferContains("Expected text");
// Check for absence
await driver.assertDisplayBufferDoesNotContain("Unwanted text");
// Get position for interactions
const buttonPos = await driver.assertDisplayBufferContains("[ YES ]");
await driver.triggerDisplayBufferKey(buttonPos, "<CR>");
Tool Result Assertions
// Use helper functions when available
assertToolResultContainsText(toolResult, "expected text");
assertToolResultHasImageSource(toolResult, "image/jpeg");
// Manual assertions for specific cases
const result = toolResult.result as Extract<
typeof toolResult.result,
{ status: "ok" }
>;
const textContent = result.value.find(
(item) => item.type === "text",
) as Extract<(typeof result.value)[0], { type: "text" }>;
expect(textContent.text).toContain("expected content");
Change Tracker Assertions
// Use specific assertions instead of timeouts
await driver.assertChangeTrackerHasEdits(2);
await driver.assertChangeTrackerContains([
{ type: "edit", filePath: "file.txt" },
]);
// DON'T use arbitrary timeouts
// await new Promise(resolve => setTimeout(resolve, 1000)); // ❌
Testing Best Practices
Integration Over Unit
- Prefer testing complete user flows over isolated units
- Use realistic nvim interactions rather than reaching into internal APIs
- Let the system work naturally (e.g., let change tracking happen through real edits)
Mock Boundaries
- Mock external services (Anthropic API) but not internal components
- Use the provided driver and mock infrastructure
- Don't manually mock internal classes or methods
Realistic Interactions
// DO: Use realistic interactions
await driver.editFile("poem.txt");
await driver.command("normal! gg");
// DON'T: Reach into internals
// driver.magenta.changeTracker.onTextDocumentDidChange(...); // ❌
File Handling
- Each test gets a fresh temporary directory
- Fixture files are automatically copied for each test
- Files can be safely mutated during tests
- Use the
setupFilescallback for custom file creation
Error Testing
- Test both success and error paths
- Verify error messages are meaningful
- Test edge cases like invalid input, missing files, etc.
Async Patterns
- Always await async operations
- Use the driver's assertion methods that handle timing
- Don't use fixed delays unless absolutely necessary
More by dlants
View all →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
rust-coding-skill
UtakataKyosui
Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.