Cometix Indexer

Cometix Indexer

Official
CometixAI

Provides semantic code search capabilities by indexing local workspaces through Cursor's backend RepositoryService. Enables fast semantic searches across your codebase with automatic synchronization.

A local indexing and retrieval service that enables semantic code search by wrapping Cursor's backend RepositoryService. It provides tools for indexing local workspaces and performing incremental semantic searches with automatic synchronization.

64153 views26Local (stdio)

What it does

  • Index local workspaces for semantic search
  • Perform semantic code searches with natural language queries
  • Filter search results with glob patterns
  • Automatically sync changes for fresh search results
  • Return exact file locations with line numbers

Best for

Developers wanting semantic code search in large codebasesCode navigation and discovery across projectsUnderstanding code functionality through natural language queries
Uses Cursor's proven indexing backendAutomatic incremental synchronizationSupports glob filtering for targeted searches

About Cometix Indexer

Cometix Indexer is an official MCP server published by CometixAI that provides AI assistants with tools and capabilities via the Model Context Protocol. Cometix Indexer — local code indexer for fast semantic code search. Index workspaces and run incremental searches with a It is categorized under developer tools.

How to install

You can install Cometix Indexer in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Cometix Indexer is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Cometix Indexer(MCP 服务器)

语义代码搜索的本地索引与检索服务。该项目实现了一个基于 Model Context Protocol(MCP)的服务端,封装了对 Cursor 后端 RepositoryService 的建库、同步与搜索流程,通过两类 MCP 工具对外提供能力:项目索引(index_project)与语义搜索(codebase_search)。

TODOs

  • Cursor Indexing(ok)
  • Warp Embedding
  • Trae Indexing
  • Augment Indexing
  • Github Indexing(来源Copilot Indexing)

功能概述

  • 索引:扫描本地工作区、生成文件清单、分批上传至 Cursor 服务端并完成建库标记。
  • 增量同步:监听文件变更,按需进行轻量同步,保证搜索前的索引新鲜度。
  • 语义搜索:调用远端检索接口并自动解密返回的加密路径,直观展示命中。
  • 运行形态:作为 MCP 服务器通过 stdio 运行并响应工具调用。

MCP

Claude Code

npx格式

{
  "mcpServers": {
    "cometix-indexer": {
      "command": "npx",
      "args": [
        "-y",
        "--package=git+https://github.com/CometixAI/Cometix-Indexer.git",
        "cometix-indexer",
        "start"
      ],
      "env": {
        "CURSOR_AUTH_TOKEN": "",
        "CURSOR_BASE_URL": "https://api2.cursor.sh"
      }
    }
  }
}

npm格式(Local)

{
  "mcpServers": {
    "cometix-indexer": {
      "command": "npm",
      "args": [
        "--prefix",
        "<path>",
        "run",
        "start"
      ],
      "env": {
        "CURSOR_AUTH_TOKEN": "",
        "CURSOR_BASE_URL": "https://api2.cursor.sh"
      }
    }
  }
}

MCP 工具

  • index_project

    • 入参:{ workspacePath: string; verbose?: boolean }
    • 行为:初始化/刷新索引,按批全量上传并计划自动同步;当 verbose=true 时,额外返回本轮上传的相对路径文件列表。
    • 返回:{ codebaseId, uploaded, batches, nextSyncAt }
  • codebase_search

    • 入参:{ query: string; paths_include_glob?: string; paths_exclude_glob?: string; max_results?: number }
    • 行为:在“唯一一个”已索引工作区内进行搜索;支持包含/排除 glob 过滤(基于工作区相对路径),并在搜索前进行按需增量同步。
    • 返回:{ total, hits: Array<{ path, score, startLine, endLine }> }

示例(概念性):

{
  "name": "index_project",
  "arguments": { "workspacePath": "E:/project" }
}
{
  "name": "codebase_search",
  "arguments": { "query": "What is the xxx paper", "paths_include_glob": "src/**/*.rs", "paths_exclude_glob": "**/tests/**", "max_results": 50 }
}

目录结构(核心)

  • src/index.ts:进程入口。解析 CLI/环境变量,创建 MCP Server 并接入 stdio 传输。
  • src/server.ts:注册 MCP 工具:index_projectcodebase_search
  • src/services/repositoryIndexer.ts:索引与同步核心逻辑(初次建库、分批上传、增量同步、定时器)。
  • src/services/codeSearcher.ts:搜索逻辑(预同步、远端搜索、结果解密与规整)。
  • src/services/fileWatcher.ts:文件变更监听,标记 pendingChanges
  • src/services/stateManager.ts:工作区状态持久化(state.json)。
  • src/crypto/pathEncryption.ts:路径分段加解密方案与 Windows/Posix 互转。
  • src/client/proto.ts:加载 proto/repository_service.proto 并以 protobuf 编解码发送 HTTP 请求。
  • src/client/cursorApi.ts:封装调用的具体 RepositoryService 接口。
  • src/utils/env.ts:配置解析、默认参数与请求头。
  • src/utils/fs.ts:忽略规则、文件遍历与可嵌入文件清单读取。
  • src/utils/semaphore.ts:并发控制与带重试的信号量。

工作原理

  1. 初次索引
  • 扫描工作区(忽略 node_modules/.git/dist/ 等)并生成默认清单 embeddable_files.txt(每个工作区独立存放)。
  • 基于 @anysphere/file-serviceMerkleClient 构建目录 Merkle 树,获取 rootHashsimhash
  • 生成路径加密密钥(pathKey),并以 V1MasterKeyedEncryptionScheme 对相对路径逐段加密。
  • 将文件按批(INITIAL_UPLOAD_MAX_FILES)执行完整流程:
    • FastRepoInitHandshakeV2 握手(返回 codebaseId)。
    • 上传本批文件(FastUpdateFileV2)。
    • EnsureIndexCreatedFastRepoSyncComplete 标记索引完成。
  • codebaseIdpathKeyorthogonalTransformSeed 等持久化到工作区状态 state.json
  1. 增量同步
  • chokidar 监听文件变更,只标记 pendingChanges = true(轻量)。
  • 搜索或定时器触发时,如存在变更:
    • 使用 SyncMerkleSubtreeV2 对目录节点进行比对,定位不匹配的子树与文件。
    • 对变更文件执行同批上传与 EnsureIndexCreated/FastRepoSyncComplete
    • 清理 pendingChanges 标记并持久化。
  1. 语义搜索
  • 搜索前先触发一次按需增量同步以保证结果新鲜。
  • 调用 SearchRepositoryV2 并对返回的加密路径用本地 pathKey 解密为 Posix 相对路径,输出 { path, score, startLine, endLine }

运行要求

  • Node.js >= 18
  • proto/repository_service.proto 必须存在(仓库已附带)。

启动(npx)

npx -y --package=git+https://github.com/CometixAI/Cometix-Indexer.git cometix-indexer -- --auth-token "$CURSOR_AUTH_TOKEN" --base-url https://api2.cursor.sh

启动(npm scripts)

# 安装依赖并构建
npm install
npm run build

# 方式一:通过环境变量(PowerShell 示例)
$env:CURSOR_AUTH_TOKEN="你的Token"; npm run start

# 方式二:通过参数传递(-- 之后的参数会透传给脚本)
npm run start -- --auth-token 你的Token --base-url https://api2.cursor.sh --log-level info

# 开发模式(监听编译;运行需要另开终端执行 start)
npm run dev
# 另开一个终端
npm run start -- --auth-token 你的Token

可用环境变量:

  • CURSOR_AUTH_TOKEN(必需)
  • CURSOR_BASE_URL(默认 https://api2.cursor.sh
  • LOG_LEVELdebug | info | warning | error,默认 info

环境变量与默认值(可调优)

  • SYNC_CONCURRENCY(默认 4)
  • SYNC_MAX_NODES(默认 2000)
  • SYNC_MAX_ITERATIONS(默认 10000)
  • SYNC_LIST_LIMIT(默认 1000)
  • FILE_SIZE_LIMIT_BYTES(默认 2MB,超出将跳过)
  • INITIAL_UPLOAD_MAX_FILES(默认 10,初次索引分批大小)
  • PROTO_TIMEOUT_MS(默认 30000)
  • PROTO_SEARCH_TIMEOUT_MS(默认 60000)
  • AUTO_SYNC_INTERVAL_MS(默认 5 分钟)

开发安装与构建

npm install
npm run build

状态与数据持久化

  • 工作区专属数据目录:%USERPROFILE%/.cometix/cursor-indexer/<safeName>-<hash>/
    • state.json:保存 codebaseIdpathKeyorthogonalTransformSeed 等。
    • embeddable_files.txt:首次索引生成的可嵌入文件列表,可手动编辑以精确控制索引范围。

路径加密与兼容性

  • 采用分段对称加密(aes-256-ctr)并在 Windows 相对路径(以 ./.\ 起始)层面进行,避免泄露真实目录结构。
  • 搜索结果会自动尝试使用本地 pathKey 解密为 Posix 相对路径,失败时回退为原始加密串。

忽略与限制

  • 默认忽略:node_modules/.git/.cursor/dist/build/coverage/ 等。
  • 超过 FILE_SIZE_LIMIT_BYTES 的文件会被跳过。

常见问题

  • 报错 repository_service.proto not found:请确认项目根目录存在 proto/repository_service.proto
  • Missing CURSOR_AUTH_TOKEN:通过 --auth-token 传参或设置环境变量 CURSOR_AUTH_TOKEN

许可

MIT

Alternatives

Related Skills

Browse all skills
ui-design-system

UI design system toolkit for Senior UI Designer including design token generation, component documentation, responsive design calculations, and developer handoff tools. Use for creating design systems, maintaining visual consistency, and facilitating design-dev collaboration.

18
ai-sdk

Answer questions about the AI SDK and help build AI-powered features. Use when developers: (1) Ask about AI SDK functions like generateText, streamText, ToolLoopAgent, embed, or tools, (2) Want to build AI agents, chatbots, RAG systems, or text generation features, (3) Have questions about AI providers (OpenAI, Anthropic, Google, etc.), streaming, tool calling, structured output, or embeddings, (4) Use React hooks like useChat or useCompletion. Triggers on: "AI SDK", "Vercel AI SDK", "generateText", "streamText", "add AI to my app", "build an agent", "tool calling", "structured output", "useChat".

6
api-documenter

Master API documentation with OpenAPI 3.1, AI-powered tools, and modern developer experience practices. Create interactive docs, generate SDKs, and build comprehensive developer portals. Use PROACTIVELY for API documentation or developer portal creation.

4
openai-knowledge

Use when working with the OpenAI API (Responses API) or OpenAI platform features (tools, streaming, Realtime API, auth, models, rate limits, MCP) and you need authoritative, up-to-date documentation (schemas, examples, limits, edge cases). Prefer the OpenAI Developer Documentation MCP server tools when available; otherwise guide the user to enable `openaiDeveloperDocs`.

4
cli-builder

Guide for building TypeScript CLIs with Bun. Use when creating command-line tools, adding subcommands to existing CLIs, or building developer tooling. Covers argument parsing, subcommand patterns, output formatting, and distribution.

3
ydc-ai-sdk-integration

Integrate Vercel AI SDK applications with You.com tools (web search, AI agent, content extraction). Use when developer mentions AI SDK, Vercel AI SDK, generateText, streamText, or You.com integration with AI SDK.

2