doc-testing

2
0
Source

Comprehensive guide for writing tests in magenta.nvim, including test environment setup, mock providers, driver interactions, and best practices

Install

mkdir -p .claude/skills/doc-testing && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3175" && unzip -o skill.zip -d .claude/skills/doc-testing && rm skill.zip

Installs to .claude/skills/doc-testing

About this skill

Testing in magenta.nvim

To run the full test suite, use npx vitest run from the project root. You do not need to cd. To run a specific test file, use npx vitest run <file>. Important You do not need to cd. Test files should use the .test.ts extension (e.g., myFeature.test.ts). Tests should make use of the node/test/preamble.ts helpers. When doing integration-level testing, like user flows, use the withDriver helper and the interactions in node/test/driver.ts. When performing generic user actions that may be reusable between tests, put them into the NvimDriver class as helpers.

As of July 2025, tests are now run in parallel for improved performance. The test infrastructure has been updated to support concurrent test execution.

Test Environment Setup

Fixture Files & Directory Structure:

  • Each test gets a fresh temporary directory in /tmp/magenta-test/{testId}/
  • Files from node/test/fixtures/ are copied into this temp directory for each test
  • Available fixture files include poem.txt, test.jpg, sample2.pdf, test.bin, and others
  • Nvim runs in this temporary directory, so files can be safely mutated during tests
  • The temp directory is automatically cleaned up after each test - no manual cleanup needed
  • Use await getcwd(driver.nvim) to get the current working directory for file path operations
  • The temporary directory is completely isolated between tests

Test Pattern:

import { withDriver } from "../test/preamble";

test("my test", async () => {
  await withDriver({}, async (driver) => {
    // Test code here - nvim runs in temp dir with fixture files
    // Access cwd with: const cwd = await getcwd(driver.nvim)
  });
});

Custom File Setup:

test("test with custom files", async () => {
  await withDriver(
    {
      setupFiles: async (tmpDir) => {
        const fs = await import("fs/promises");
        const path = await import("path");
        await fs.writeFile(path.join(tmpDir, "custom.txt"), "content");
        await fs.mkdir(path.join(tmpDir, "subfolder"));
      },
    },
    async (driver) => {
      // Custom files are now available in the test environment
    },
  );
});

Directory Structure:

The test environment creates an isolated directory structure:

  • baseDir: /tmp/magenta-test/{testId}/ - root of all test directories
  • tmpDir: {baseDir}/cwd/ - the working directory where nvim runs (fixtures copied here)
  • homeDir: {baseDir}/home/ - simulated home directory ($HOME is set to this)

The withDriver callback receives a dirs object with all three paths:

await withDriver({}, async (driver, dirs) => {
  console.log(dirs.tmpDir); // /tmp/magenta-test/abc123/cwd
  console.log(dirs.homeDir); // /tmp/magenta-test/abc123/home
  console.log(dirs.baseDir); // /tmp/magenta-test/abc123
});

Setting Up Home Directory Files:

Use setupHome to create files in the simulated home directory. This is useful for testing features that read from ~/.magenta/ or other home directory paths:

test("test with home directory config", async () => {
  await withDriver(
    {
      setupHome: async (homeDir) => {
        const fs = await import("fs/promises");
        const path = await import("path");
        // Create ~/.magenta/options.json
        const magentaDir = path.join(homeDir, ".magenta");
        await fs.mkdir(magentaDir, { recursive: true });
        await fs.writeFile(
          path.join(magentaDir, "options.json"),
          JSON.stringify({
            filePermissions: [{ path: "~/Documents", read: true }],
          }),
        );
      },
    },
    async (driver) => {
      // Magenta will load options from the simulated ~/.magenta/options.json
    },
  );
});

Setting Up Directories Outside CWD:

Use setupExtraDirs to create directories outside the working directory. This is useful for testing file permission boundaries:

test("test with external directories", async () => {
  let outsidePath: string;

  await withDriver(
    {
      setupExtraDirs: async (baseDir) => {
        const fs = await import("fs/promises");
        const path = await import("path");
        // Create a directory outside cwd
        outsidePath = path.join(baseDir, "outside");
        await fs.mkdir(outsidePath, { recursive: true });
        await fs.writeFile(path.join(outsidePath, "secret.txt"), "secret");
      },
    },
    async (driver, dirs) => {
      // outsidePath is outside dirs.tmpDir, so file access should be restricted
      // unless explicitly permitted via filePermissions
    },
  );
});

Combined Setup for Permission Testing:

A common pattern for testing file permissions is to use both setupExtraDirs and setupHome together:

test("can access external dir with filePermissions", async () => {
  let outsidePath: string;

  await withDriver(
    {
      setupExtraDirs: async (baseDir) => {
        const fs = await import("fs/promises");
        const path = await import("path");
        outsidePath = path.join(baseDir, "outside");
        await fs.mkdir(outsidePath, { recursive: true });
        await fs.writeFile(path.join(outsidePath, "allowed.txt"), "content");

        // Write options.json here since we now have the path
        const homeDir = path.join(baseDir, "home");
        const magentaDir = path.join(homeDir, ".magenta");
        await fs.mkdir(magentaDir, { recursive: true });
        await fs.writeFile(
          path.join(magentaDir, "options.json"),
          JSON.stringify({
            filePermissions: [{ path: outsidePath, read: true }],
          }),
        );
      },
    },
    async (driver) => {
      // Tools can now access outsidePath due to filePermissions
    },
  );
});

Available Mocks & Test Interactions

Configuring Magenta Options:

Tests can override magenta options by passing them to withDriver:

test("test with custom options", async () => {
  await withDriver(
    {
      options: {
        getFileAutoAllowGlobs: ["*.log", "config/*"],
        changeDebounceMs: 100,
        // Any other MagentaOptions can be overridden here
      },
    },
    async (driver) => {
      // Magenta will use the custom options
    },
  );
});

Available options include:

  • getFileAutoAllowGlobs - Array of glob patterns for auto-allowing file reads
  • changeDebounceMs - Override the default change tracking debounce
  • Any other options from MagentaOptions type

Mock Provider Interactions:

The mock provider (driver.mockAnthropic) uses MockStream objects that mirror Anthropic's streaming API. Streams contain Anthropic-formatted messages (Anthropic.MessageParam[]), not our internal ProviderMessage[] format.

Required Type Imports for Tests:

import type Anthropic from "@anthropic-ai/sdk";

type ToolResultBlockParam = Anthropic.Messages.ToolResultBlockParam;
type ContentBlockParam = Anthropic.Messages.ContentBlockParam;
type TextBlockParam = Anthropic.Messages.TextBlockParam;
type DocumentBlockParam = Anthropic.Messages.DocumentBlockParam;

Awaiting Streams:

// Wait for any pending stream
const stream = await driver.mockAnthropic.awaitPendingStream();

// Wait for stream with specific text in message content
const stream =
  await driver.mockAnthropic.awaitPendingStreamWithText("specific text");

// Wait for user message (tool results, etc.)
const stream = await driver.mockAnthropic.awaitPendingUserRequest();

// Wait for forced tool use requests
const forceRequest =
  await driver.mockAnthropic.awaitPendingForceToolUseRequest();

// Check if there's a pending stream with specific text (non-blocking)
const hasPending = driver.mockAnthropic.hasPendingStreamWithText("text");

Responding to Streams:

// Simple text response
stream.respond({
  stopReason: "end_turn",
  text: "Response text",
  toolRequests: [],
});

// Response with tool use
stream.respond({
  stopReason: "tool_use",
  text: "I'll use a tool",
  toolRequests: [
    {
      status: "ok",
      value: {
        id: "tool_id" as ToolRequestId,
        toolName: "get_file" as ToolName,
        input: { filePath: "./file.txt" as UnresolvedFilePath },
      },
    },
  ],
});

// Response with error tool request
stream.respond({
  stopReason: "tool_use",
  text: "Tool failed",
  toolRequests: [
    {
      status: "error",
      rawRequest: { invalid: "request" },
    },
  ],
});

Responding to Force Tool Use Requests:

const forceRequest =
  await driver.mockAnthropic.awaitPendingForceToolUseRequest();

// Successful tool response
await driver.mockAnthropic.respondToForceToolUse({
  toolRequest: {
    status: "ok",
    value: {
      id: "tool_id" as ToolRequestId,
      toolName: "get_file" as ToolName,
      input: { filePath: "./file.txt" as UnresolvedFilePath },
    },
  },
  stopReason: "tool_use",
});

// Error tool response
await driver.mockAnthropic.respondToForceToolUse({
  toolRequest: {
    status: "error",
    rawRequest: { invalid: "data" },
  },
  stopReason: "tool_use",
});

Stream Inspection:

// Access stream properties (Anthropic format)
console.log(stream.messages); // Anthropic.MessageParam[] - raw Anthropic format
console.log(stream.getProviderMessages()); // ProviderMessage[] - converted format
console.log(stream.systemPrompt); // System prompt (if any)

// For force tool use requests
console.log(forceRequest.spec); // Tool specification
console.log(forceRequest.model); // Model used
console.log(forceRequest.messages); // Message history

// Check if stream was aborted
if (stream.aborted) {
  // Handle aborted stream
}

Advanced Response Patterns:

// Stream individual parts of response
stream.streamText("First part of response");
stream.streamToolUse(toolId, toolName, input);
stream.streamThinking("Thinking content", "signature");
stream.finishResponse("end_turn");

// Respond with errors
stream.respondWithError(new Error("Something went wrong"));

Mock Provider:

  • driver.mockAnthropic - Pre-configured mock provider that captures all streams
  • await driver.mockAnthropic.awaitPendingStream() - Wait for regular message streams
  • await driver.mockAnthropic.awaitPendingStreamWithText("text") - Wait for stream containing specific text
  • await driver.mockAnthropic.awaitPendingForceToolUseRequest() - Wait for forced tool use requests
  • await driver.mockAnthropic.respondToForceToolUse({...}) - Send mock responses
  • No need to manually mock providers - they're already set up in the test infrastructure

Tool Result Content Structure (Important!):

Anthropic's ToolResultBlockParam has a different structure than our internal ProviderToolResult:

// Our internal format (ProviderToolResult):
{
  type: "tool_result",
  id: ToolRequestId,
  result: {
    status: "ok" | "error",
    value: ProviderToolResultContent[], // nested here
    error?: string,
  }
}

// Anthropic format (ToolResultBlockParam) - what you see in stream.messages:
{
  type: "tool_result",
  tool_use_id: string,         // different field name!
  content: string | ContentBlockParam[],  // different field name!
  is_error?: boolean,          // different error indicator!
}

Document Blocks are Siblings, Not Nested:

When documents are sent to Anthropic, they appear as sibling blocks in the user message, not nested inside tool_result.content:

// User message content array:
[
  { type: "tool_result", tool_use_id: "...", content: [], is_error: false },
  { type: "document", source: {...}, title: "..." }  // <-- sibling, not nested!
]

Finding Tool Results in Stream Messages:

const stream = await driver.mockAnthropic.awaitPendingStream();

// Find user message containing the tool result
let userMessageContent: ContentBlockParam[] | undefined;
for (const msg of stream.messages) {
  if (msg.role === "user" && Array.isArray(msg.content)) {
    const content = msg.content as ContentBlockParam[];
    const hasToolResult = content.some(
      (block: ContentBlockParam) => block.type === "tool_result",
    );
    if (hasToolResult) userMessageContent = content;
  }
}

// Get the tool result block
const toolResult = userMessageContent!.find(
  (block: ContentBlockParam) => block.type === "tool_result",
) as ToolResultBlockParam;

// Check for errors
expect(toolResult.is_error).toBeFalsy();

// Access content (note: might be string or array)
if (Array.isArray(toolResult.content)) {
  const textContent = toolResult.content.find(
    (item: ContentBlockParam) => item.type === "text",
  ) as TextBlockParam;
}

Checking Error Results:

// Anthropic format for errors:
expect(toolResult.is_error).toBe(true);
const errorContent =
  typeof toolResult.content === "string"
    ? toolResult.content
    : JSON.stringify(toolResult.content);
expect(errorContent).toContain("expected error message");

Driver Interactions (prefer these over internal API access):

  • await driver.editFile("poem.txt") - Open fixture files
  • await driver.command("normal! gg") - Execute vim commands
  • await driver.magenta.command("predict-edit") - Execute magenta commands
  • Use real nvim interactions to trigger change tracking naturally

Testing Best Practices:

  • DO: Use realistic nvim interactions (driver.editFile(), driver.command())
  • DON'T: Reach into internal APIs (driver.magenta.changeTracker.onTextDocumentDidChange())
  • DO: Let the system work naturally - make real edits and let change tracking happen
  • DO: Write integration tests that exercise the full user flow
  • DON'T: Mock internal components - use the provided driver and mock provider

Change Tracker Testing:

  • DO: Use driver.assertChangeTrackerHasEdits(count) and driver.assertChangeTrackerContains(changes) instead of arbitrary timeouts
  • DO: Be aware that rapid edits may be batched into single changes by the tracker
  • DO: Use explicit assertions about what changes should be tracked rather than waiting fixed amounts of time
  • DON'T: Use setTimeout() or fixed delays when waiting for change tracking - use the assertion methods instead

Mock Stream Objects: Streams captured by awaitPendingStream() contain:

  • stream.messages - Anthropic.MessageParam[] (raw Anthropic format)
  • stream.getProviderMessages() - ProviderMessage[] (converted format for easier assertions)
  • stream.systemPrompt - The system prompt used (if any)
  • stream.aborted - Whether the stream was aborted
  • stream.resolved - Whether the stream has finished

Force tool use requests captured by awaitPendingForceToolUseRequest() contain:

  • request.spec - The tool specification used
  • request.model - Which model was requested
  • request.messages - The messages array containing user/assistant conversation
  • request.systemPrompt - The system prompt used (if any)
  • request.defer - Promise resolution control

Type Narrowing with expect():

expect() assertions don't narrow TypeScript's discriminated unions. Add explicit guards:

expect(documentContent.source.type).toBe("base64");
// This doesn't narrow the type, so add:
if (documentContent.source.type !== "base64")
  throw new Error("Expected base64 source");
// Now TypeScript knows source has media_type and data
expect(documentContent.source.media_type).toBe("application/pdf");

System Reminders in Mock Streams:

System reminders are an internal ProviderMessage type (system_reminder) that get converted to plain text blocks with <system-reminder> tags when sent to Anthropic:

// In tests checking mock stream messages, search for text blocks containing the tag:
function findSystemReminderText(
  content: string | ContentBlockParam[],
): TextBlockParam | undefined {
  if (typeof content === "string") return undefined;
  return content.find(
    (c): c is TextBlockParam =>
      c.type === "text" && c.text.includes("<system-reminder>"),
  );
}

System Prompt vs User Messages: When implementing AI features, maintain proper separation:

  • System prompt: General instructions about the agent's role and behavior ("You have to do your best to predict...")
  • User messages: Specific contextual data (buffer content, cursor position, recent changes) This separation keeps the system prompt focused on behavior while allowing dynamic context in messages.

Test Writing Best Practices

Avoid Conditional Expect Statements

DON'T write tests with conditional expects like this:

if (toolResult && toolResult.type === "tool_result") {
  expect(toolResult.result.status).toBe("ok");
  if (toolResult.result.status === "ok") {
    const textContent = toolResult.result.value.find(
      (item) => item.type === "text",
    );
    if (textContent && textContent.type === "text") {
      expect(textContent.text).toContain("expected content");
    }
  }
}

DO use TypeScript type assertions and direct expects:

const toolResult = toolResultMessage.content[0] as Extract<
  (typeof toolResultMessage.content)[0],
  { type: "tool_result" }
>;
expect(toolResult.type).toBe("tool_result");
expect(toolResult.result.status).toBe("ok");

const result = toolResult.result as Extract<
  typeof toolResult.result,
  { status: "ok" }
>;

const textContent = result.value.find(
  (item) => item.type === "text",
) as Extract<(typeof result.value)[0], { type: "text" }>;
expect(textContent).toBeDefined();
expect(textContent.text).toContain("expected content");

TypeScript Type Narrowing in Tests

Use TypeScript's Extract utility type to narrow union types safely:

// For narrowing message content
const toolResult = content[0] as Extract<
  (typeof content)[0],
  { type: "tool_result" }
>;

// For narrowing result status
const okResult = toolResult.result as Extract<
  typeof toolResult.result,
  { status: "ok" }
>;

const errorResult = toolResult.result as Extract<
  typeof toolResult.result,
  { status: "error" }
>;

Test Structure Patterns

Basic Test Structure

it("should do something", async () => {
  await withDriver({}, async (driver) => {
    await driver.showSidebar();

    // Trigger the action
    await driver.inputMagentaText(`Some command`);
    await driver.send();

    // Mock the response
    const request = await driver.mockAnthropic.awaitPendingRequest();
    request.respond({
      stopReason: "tool_use",
      text: "response text",
      toolRequests: [
        /* tool requests */
      ],
    });

    // Assert the UI state
    await driver.assertDisplayBufferContains("Expected UI text");

    // Handle tool result and verify
    const toolResultRequest = await driver.mockAnthropic.awaitPendingRequest();
    const toolResultMessage =
      toolResultRequest.messages[toolResultRequest.messages.length - 1];

    // Type-safe assertions
    expect(toolResultMessage.role).toBe("user");
    expect(Array.isArray(toolResultMessage.content)).toBe(true);

    const toolResult = toolResultMessage.content[0] as Extract<
      (typeof toolResultMessage.content)[0],
      { type: "tool_result" }
    >;
    expect(toolResult.type).toBe("tool_result");
    expect(toolResult.result.status).toBe("ok");
  });
});

Tests with File Setup

it("should handle custom files", async () => {
  await withDriver(
    {
      setupFiles: async (tmpDir) => {
        const fs = await import("fs/promises");
        const path = await import("path");
        await fs.writeFile(path.join(tmpDir, "test.txt"), "content");
      },
    },
    async (driver) => {
      // Test implementation
    },
  );
});

Tests with Custom Options

it("should respect configuration", async () => {
  await withDriver(
    {
      options: {
        someOption: ["value1", "value2"],
      },
    },
    async (driver) => {
      // Test implementation
    },
  );
});

Mock Provider Patterns

Awaiting Streams

// Wait for regular streams
const stream = await driver.mockAnthropic.awaitPendingStream();

// Wait for forced tool use requests
const forceRequest =
  await driver.mockAnthropic.awaitPendingForceToolUseRequest();

Responding to Streams

// Simple response
stream.respond({
  stopReason: "end_turn",
  text: "Response text",
  toolRequests: [],
});

// Response with tool use
stream.respond({
  stopReason: "tool_use",
  text: "I'll use a tool",
  toolRequests: [
    {
      status: "ok",
      value: {
        id: "tool_id" as ToolRequestId,
        toolName: "tool_name" as ToolName,
        input: { param: "value" },
      },
    },
  ],
});

Common Assertion Patterns

UI Assertions

// Check for presence
await driver.assertDisplayBufferContains("Expected text");

// Check for absence
await driver.assertDisplayBufferDoesNotContain("Unwanted text");

// Get position for interactions
const buttonPos = await driver.assertDisplayBufferContains("[ YES ]");
await driver.triggerDisplayBufferKey(buttonPos, "<CR>");

Tool Result Assertions

// Use helper functions when available
assertToolResultContainsText(toolResult, "expected text");
assertToolResultHasImageSource(toolResult, "image/jpeg");

// Manual assertions for specific cases
const result = toolResult.result as Extract<
  typeof toolResult.result,
  { status: "ok" }
>;
const textContent = result.value.find(
  (item) => item.type === "text",
) as Extract<(typeof result.value)[0], { type: "text" }>;
expect(textContent.text).toContain("expected content");

Change Tracker Assertions

// Use specific assertions instead of timeouts
await driver.assertChangeTrackerHasEdits(2);
await driver.assertChangeTrackerContains([
  { type: "edit", filePath: "file.txt" },
]);

// DON'T use arbitrary timeouts
// await new Promise(resolve => setTimeout(resolve, 1000)); // ❌

Testing Best Practices

Integration Over Unit

  • Prefer testing complete user flows over isolated units
  • Use realistic nvim interactions rather than reaching into internal APIs
  • Let the system work naturally (e.g., let change tracking happen through real edits)

Mock Boundaries

  • Mock external services (Anthropic API) but not internal components
  • Use the provided driver and mock infrastructure
  • Don't manually mock internal classes or methods

Realistic Interactions

// DO: Use realistic interactions
await driver.editFile("poem.txt");
await driver.command("normal! gg");

// DON'T: Reach into internals
// driver.magenta.changeTracker.onTextDocumentDidChange(...); // ❌

File Handling

  • Each test gets a fresh temporary directory
  • Fixture files are automatically copied for each test
  • Files can be safely mutated during tests
  • Use the setupFiles callback for custom file creation

Error Testing

  • Test both success and error paths
  • Verify error messages are meaningful
  • Test edge cases like invalid input, missing files, etc.

Async Patterns

  • Always await async operations
  • Use the driver's assertion methods that handle timing
  • Don't use fixed delays unless absolutely necessary

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

250779

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

195410

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

173269

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

200227

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

156191

rust-coding-skill

UtakataKyosui

Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.

158171

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.