Updated June 2026Cookbook16 min read

Cursor Codebase Indexing skill: 10 setup & tuning recipes

Ten real workflows for Cursor’s codebase index, driven from Claude Code — index audits, .cursorignore authoring, the two-ignore-file split, monorepo scoping, @Codebase query craft, stale-index fixes, the Linux watcher limit, privacy sign-off, indexing private docs, and the Claude Code translation — each as one prompt with the exact output it produces.

Already know what skills are? Skip to the cookbook. First time? Read the explainer then come back. Need the install? It’s on the /skills/cursor-codebase-indexing page.

Editorial illustration: a stacked code-files glyph and a vector-node cluster glyph connected by a luminous teal flow arc, with a small magnifying-glass accent, on a midnight navy background.
On this page · 21 sections
  1. What this skill does
  2. The cookbook
  3. Install + README
  4. Watch the index work
  5. 01 · Audit a fresh project's index before trusting @Codebase
  6. 02 · Write a .cursorignore that cuts the index to signal
  7. 03 · Split ignore duty with .cursorindexingignore
  8. 04 · Scope a monorepo so indexing finishes today
  9. 05 · Ask @Codebase questions that actually hit
  10. 06 · Un-stick a stale or stuck index
  11. 07 · Fix Linux file-watcher exhaustion on large repos
  12. 08 · Run a privacy and compliance pass before team rollout
  13. 09 · Index your private docs on purpose
  14. 10 · Recreate the workflow in Claude Code
  15. Community signal
  16. The contrarian take
  17. Reading on Cursor's index
  18. Gotchas
  19. Pairs well with
  20. FAQ
  21. Sources

What this skill actually does

Sixty seconds of context before the cookbook — what the cursor-codebase-indexing skill is, what Claude returns when you invoke it, and the one thing it does NOT do for you.

What this skill actually does

Set up and optimize Cursor codebase indexing for semantic code search and @Codebase queries.

Jeremy Longshore, the skill author · /skills/cursor-codebase-indexing

What Claude returns

Claude writes `.cursorignore` and `.cursorindexingignore` files in `.gitignore` syntax, scopes monorepo indexing (open the package directory, or exclude sibling packages at the root), and runs the troubleshooting ladder: `Cmd+Shift+P > Cursor: Resync Index`, per-OS cache deletion paths, and the Linux `fs.inotify.max_user_watches` sysctl fix. It also drafts @Codebase query patterns and a privacy checklist covering Turbopuffer storage, Privacy Mode, and air-gapped limits.

What it does NOT do

It does not build the index itself — Cursor's own indexer does that. The skill writes the configuration and the fixes from Claude Code.

How you trigger it

set up cursor codebase indexing for this repo@codebase returns stale results — fix the indexwrite a .cursorignore for this monorepo

Cost when idle

~100 tokens at idle (name + description in the system prompt). The full body loads only when a trigger phrase matches.

The system this skill operates is worth one paragraph. When Cursor opens a project, it splits your files into syntactic chunks, converts each chunk to an embedding — a vector that encodes meaning — and stores the vectors in Turbopuffer, a cloud-hosted nearest-neighbor store. An @Codebase question becomes an embedding too; the closest chunks come back as context. A Merkle tree tracks changes so only modified files re-index, roughly every 10 minutes. Cursor’s own A/B tests put numbers on why this matters: semantic search raised answer accuracy by 12.5% on average, with bigger gains on repos past 1,000 files. The index is real leverage — but only if it’s indexing the right files, which is what most of this cookbook is about.

The cookbook

Each entry is a workflow you could run today, in the order I’d teach them: audit first, then the two ignore files, then scoping, querying, and the troubleshooting ladder, ending with the two power moves — indexing private docs and porting the whole mental model to Claude Code. Every entry pairs with skills or MCP servers from mcp.directory; the prompts assume Claude Code with the skill installed, per our Claude Code best practices. Run them against a real repo, not a toy — index problems only show up at size.

Install + README

If the skill isn’t on your machine yet, here’s the one-liner. The full install panel (Codex, Copilot, Antigravity variants) is on the skill page. No setup steps after install — the skill is pure reference and configuration; Cursor itself does the indexing.

One-line install · by jeremylongshore

Open skill page

Install

mkdir -p .claude/skills/cursor-codebase-indexing && curl -L -o skill.zip "https://mcp.directory/api/skills/download/1795" && unzip -o skill.zip -d .claude/skills/cursor-codebase-indexing && rm skill.zip

Installs to .claude/skills/cursor-codebase-indexing

Watch the index work

Before ten prompts that tune the pipeline, watch the pipeline. Ben Dicken’s animation of how Cursor searches code is the fastest route to the right mental model — chunks, vectors, nearest-neighbor — and everything in the cookbook assumes it.

01

Audit a fresh project's index before trusting @Codebase

Before asking the index anything, confirm what's actually in it. The skill walks the status check and the included-files view, then flags the junk inflating your index.

ForAnyone opening a repo in Cursor for the first time and wondering why indexing is still running.

The prompt

Use the cursor-codebase-indexing skill. I just opened this repo in Cursor and the status bar has said "Indexing..." for twenty minutes. Walk me through verifying index status, then look at this repo's tree and tell me which directories are probably being indexed but shouldn't be — build output, vendored deps, generated files, large data. For each, say whether it's already covered by .gitignore (Cursor skips those automatically) or git-tracked and needs an explicit rule. End with a count of files each proposed rule would remove.

What slides.md looks like

# Status bar (bottom of Cursor):
#   "Indexing..."  → initial embedding pass running
#   "Indexed"      → @Codebase queries ready
# Cursor Settings > Features > Codebase Indexing
#   > View included files

In the index but shouldn't be:
  dist/           4,812 files  git-tracked → needs rule
  fixtures/       1,037 files  git-tracked → needs rule
  node_modules/   covered by .gitignore — already skipped
→ 2 rules, ~5,800 files out of the index

One-line tweak

Re-run the audit after a big dependency or codegen change — index bloat creeps back in through git-tracked generated files.

02

Write a .cursorignore that cuts the index to signal

One file in the project root decides what Cursor's AI can see. The skill generates it in .gitignore syntax: build artifacts, dependencies, minified bundles, data dumps, and secrets all out.

ForTeams whose @Codebase results keep surfacing dist/ output instead of source.

The prompt

Use the cursor-codebase-indexing skill. Generate a .cursorignore for this repo (Next.js app, Python scripts/ directory, Playwright e2e/). Cover four groups with comments: build artifacts, dependencies, generated files (minified JS, source maps, lockfiles), and large data files. Add a defense-in-depth section for secrets — .env variants and any credentials directories — even though they're gitignored. Only include rules for paths that actually exist or plausibly will; don't paste a generic template.

What slides.md looks like

# .cursorignore
# Build artifacts
.next/
dist/
out/

# Generated files
*.min.js
*.map
*.lock

# Large data files
*.csv
*.sqlite
fixtures/

# Secrets — defense in depth (also in .gitignore)
.env*
**/credentials/

One-line tweak

Remember the blast radius: .cursorignore hides files from ALL AI features, not only the index — don't ignore anything you still want to @Files.

03

Split ignore duty with .cursorindexingignore

Some files should stay out of the index but remain reachable when you point at them. .cursorindexingignore excludes from indexing only — @Files still works on those paths.

ForAnyone with big test fixtures or generated docs they occasionally need to reference explicitly.

The prompt

Use the cursor-codebase-indexing skill. Our e2e/recordings/ and tests/fixtures/ directories are huge and pollute @Codebase results, but I still open individual fixtures with @Files while debugging. Set this up so they're excluded from the index but stay accessible to AI features. Explain in one comment block at the top of each file which ignore file does what, so the next person doesn't merge them back into one.

What slides.md looks like

# .cursorindexingignore
# Excluded from the INDEX only — @Files still works.
tests/fixtures/
e2e/recordings/
docs/.vitepress/dist/

# .cursorignore
# Hidden from indexing AND all AI features.
# Nothing here should ever need an @Files mention.
dist/
.env*

One-line tweak

When in doubt, start in .cursorindexingignore — it's the reversible choice; promote paths to .cursorignore only when the AI should never see them.

04

Scope a monorepo so indexing finishes today

A 50K-file monorepo can take hours to index from the root. Open the package you're working on instead, or exclude sibling packages at the root — the skill writes both setups.

ForMonorepo developers whose first index never seems to complete.

The prompt

Use the cursor-codebase-indexing skill. Our monorepo has packages/web, packages/mobile, packages/admin, packages/api, and packages/shared. I work in api and shared. Give me both strategies: (a) the cursor command to open only the package directory, and (b) a root-level .cursorignore that excludes the packages I don't touch while keeping api and shared indexed. Tell me which one you'd pick for my case and why, in two sentences.

What slides.md looks like

# (a) Open the package, not the monorepo:
cursor /path/to/monorepo              # indexes everything — slow
cursor /path/to/monorepo/packages/api # this package only — fast

# (b) Root .cursorignore — keep root open, scope the index:
packages/web/
packages/mobile/
packages/admin/
# packages/api/    ← not listed → indexed
# packages/shared/ ← not listed → indexed

One-line tweak

Pick (b) when api imports from shared — cross-package @Codebase answers need both in one index.

05

Ask @Codebase questions that actually hit

@Codebase is nearest-neighbor search over embeddings — it matches meaning, not keywords. The skill teaches the query shapes that exploit that, and when to hand off to @Files or plain grep.

ForDevelopers who type keyword-style queries into @Codebase and get noise back.

The prompt

Use the cursor-codebase-indexing skill. I'm new to this codebase and need to understand how payments work. Give me five @Codebase queries phrased for semantic search — questions about behavior and flow, not symbol names — covering: where payment processing starts, how failures are retried, where we talk to the payment provider, how refunds differ from charges, and what gets logged. Then show the handoff: once @Codebase finds the file, what do I switch to for the actual edit and why.

What slides.md looks like

@Codebase how does the payment processing flow work?
@Codebase where do we retry failed payment attempts?
@Codebase find all places where we call the payment provider API
@Codebase how is a refund handled differently from a charge?

# Discovery → precision handoff:
# 1. @Codebase surfaces src/billing/charge.ts
# 2. switch to @Files src/billing/charge.ts to edit
# @Codebase = high context cost — use it to FIND
# @Files    = low context cost  — use it to WORK

One-line tweak

If you already know the exact string ('PaymentIntentFailed'), skip embeddings entirely — Ctrl+Shift+F is free and exact.

06

Un-stick a stale or stuck index

Post-refactor, @Codebase keeps returning the old file layout. The skill runs the escalation ladder: wait out the re-index window, force a resync, then nuke the local cache.

ForAnyone whose search results reference files that were renamed or deleted last week.

The prompt

Use the cursor-codebase-indexing skill. We renamed src/utils/ to src/lib/ two days ago and @Codebase still answers with the old paths. Give me the fix as an escalation ladder: what the normal re-index latency is and why (change detection), the command-palette resync, and the local cache deletion path for macOS as the last resort. For each rung, tell me how to confirm it worked before escalating.

What slides.md looks like

# Rung 1 — wait: changed files re-index ~every 10 min
#   (Merkle-tree change detection, modified files only)

# Rung 2 — force it:
Cmd+Shift+P > Cursor: Resync Index
#   confirm: status bar shows indexing progress

# Rung 3 — nuke the local cache, then restart Cursor:
macOS:   ~/Library/Application Support/Cursor/Cache/
Linux:   ~/.config/Cursor/Cache/
Windows: %APPDATA%\Cursor\Cache\

One-line tweak

High CPU right after rung 3 is the full re-embedding pass, not a bug — it subsides when the status bar flips to Indexed.

07

Fix Linux file-watcher exhaustion on large repos

On Linux, big projects silently hit the inotify watch limit and indexing stalls without an obvious error. Two sysctl commands fix it; the skill knows both.

ForLinux developers whose index sticks at N% on repos that work fine for macOS teammates.

The prompt

Use the cursor-codebase-indexing skill. Indexing on my Ubuntu machine stalls partway through a ~80K-file repo; the same repo indexes fine on a colleague's Mac. Check whether I'm hitting the inotify file-watcher limit: show me how to read the current limit, raise it temporarily to test, and make the change permanent if it fixes the stall. Note what value you're setting and why that number.

What slides.md looks like

# Check the current limit
cat /proc/sys/fs/inotify/max_user_watches

# Raise it (temporary — gone after reboot)
sudo sysctl fs.inotify.max_user_watches=524288

# Make it permanent
echo "fs.inotify.max_user_watches=524288" | \
  sudo tee -a /etc/sysctl.conf
sudo sysctl -p

One-line tweak

Pair this with use case 2 first — a lean .cursorignore may keep you under the default limit without touching sysctl at all.

08

Run a privacy and compliance pass before team rollout

Indexing sends embeddings to cloud storage. Before a security team signs off, someone has to answer where code goes, what's retained, and what breaks offline. The skill drafts that answer.

ForTech leads who need indexing approved by security before the team can use @Codebase.

The prompt

Use the cursor-codebase-indexing skill. Draft the privacy assessment our security team will ask for before we enable codebase indexing org-wide: where embeddings are stored, whether plaintext code is retained server-side, what metadata exists, what Privacy Mode changes, and what happens for the one repo that lives on an air-gapped subnet. Format it as a checklist with a verdict line per item, and flag anything that needs an exception rather than an approval.

What slides.md looks like

PRIVACY ASSESSMENT — Cursor codebase indexing
[x] Plaintext code stored server-side: NO — embeddings
    + obfuscated metadata only
[x] Vector storage: Turbopuffer (cloud-hosted)
[x] Privacy Mode ON: zero data retention at provider
[!] Air-gapped repo: indexing needs network access to
    the embedding API — UNAVAILABLE offline → exception
[x] .cursorignore covers .env*, **/secrets/ (recipe 2)
→ verdict: approve org-wide; exempt air-gapped repo

One-line tweak

Re-run the assessment when Cursor ships indexing changes — the answers here are version-dependent, and your security team will ask.

09

Index your private docs on purpose

The index isn't only for code. Scrape internal docs to Markdown, drop them in a context/ folder, and let @Codebase answer questions from documentation Cursor would otherwise never see.

ForTeams whose critical knowledge lives in private docs no public docs-lookup tool can reach.

The prompt

Use the cursor-codebase-indexing skill. I have internal API docs and two vendored-library references scraped to Markdown. Set up a context/ folder that is excluded from git but INCLUDED in Cursor's index — I want @Codebase to answer questions from these docs, but they don't belong in version control. Show me the .gitignore and .cursorignore entries that produce that exact split, and explain why the re-include rule is needed given that Cursor skips gitignored files by default.

What slides.md looks like

# .gitignore — keep scraped docs out of the repo
/context/

# .cursorignore — re-include them for the index
# (Cursor auto-excludes gitignored paths; the negation
#  rule opts this folder back in)
!/context/

context/
├── internal-billing-api.md
├── vendor-sdk-reference.md
└── pg-partitioning-notes.md

@Codebase how do we handle billing webhook retries?

One-line tweak

This is the trick an HN user used to index entire transcribed textbooks — the index doesn't care whether Markdown came from your repo or a PDF.

10

Recreate the workflow in Claude Code

Claude Code has no embedding index — and mostly doesn't need one. The skill maps each Cursor indexing feature to its Claude Code equivalent, with Serena covering the semantic gap.

ForDevelopers running both editors who want one mental model for code search in each.

The prompt

Use the cursor-codebase-indexing skill as reference. I use Cursor and Claude Code on the same repo. Build me the translation table: for @Codebase semantic search, @Files, @Folders, and exact text search, what is the Claude Code equivalent, and where does the Serena MCP server fit? Be honest about the architectural difference — embeddings versus a language-server symbol graph — and when each approach finds things the other misses.

What slides.md looks like

Cursor                    → Claude Code equivalent
@Codebase (embeddings)    → Serena MCP: symbol search +
                            references, LSP-backed
@Files file mention       → @path/to/file in the prompt
@Folders                  → directory mention / Glob
Ctrl+Shift+F exact match  → Grep tool (ripgrep)
index maintenance         → none — Serena queries the
                            language server on demand

# embeddings match MEANING; the symbol graph
# matches STRUCTURE. Different misses, both useful.

One-line tweak

Concept queries ('where do we throttle?') favor embeddings; structural queries ('who calls this?') favor the symbol graph — route accordingly.

Community signal

Three voices from people running the index on real codebases: the at-scale endorsement, the creative-use ceiling, and the honest middle of the grep-versus-embeddings debate. All three are verbatim from Hacker News.

AI can search your code today. In cursor this is called “codebase indexing”. We have some million(s) lines of code, orders of magnitude smaller than Facebook, but definitely larger than the average startup. We search with AI tools, through Q&A, and for AI-driven code mods.

jitl (HN) · Hacker News

The at-scale endorsement: an engineer on a multi-million-line codebase using the index for Q&A and code mods, not autocomplete party tricks.

Source
I scrape the docs to Markdown, stick them into a “context” folder, and use Cursor's vector codebase indexing. This allows the agent to literally ask questions like “how do I do ABC with library XYZ?” and the vector database delivers a chunked answer from all available documentation.

electroly (HN) · Hacker News

The creative-use ceiling — recipe 9 is this comment turned into a setup. The same thread includes the .gitignore + !/context/ re-include trick.

Source
For various reasons (RL, inherent structure of code) iterative grepping is unreasonably effective. Interestingly Cursor does use embedding vectors for codebase indexing... Seems like sometimes Cursor has a better understanding of the vibe of my codebase than Claude code, maybe this is part of it.

mips_avatar (HN) · Hacker News

The honest middle position in the grep-vs-embeddings debate: grep is unreasonably effective, and the index still catches things grep can't phrase.

Source

The contrarian take

Not everyone is sold. The most useful critique comes from ramoz (HN), who never understood what the index was doing for them:

I kept wondering why Cursor was indexing my codebase, it was never clear. Anyway context to me enables a lot more assurance and guarantees. RAG never did.

ramoz (HN) · Hacker News

From an HN thread on deterministic context assembly versus retrieval.

Source

Half of this lands. Cursor indexes on open with no ceremony, and a developer who never types @Codebase pays the indexing cost for nothing — recipe 1 exists because the index’s contents are invisible until you go looking. But “RAG never did” aged badly: Cursor’s published A/B tests measured 12.5% higher answer accuracy with semantic search on, and higher code retention on large repos. My take: deterministic context wins when you already know what to include; the index wins when you don’t know what you don’t know. Discovery is the whole point.

One more alternative worth naming: if what you actually want is semantic code search for Claude Code or any MCP client, that’s the Serena MCP server (/servers/serena) — symbol-level retrieval backed by a language server instead of embeddings. The trade-off is the usual skill-vs-MCP one: this skill costs ~100 idle tokens and configures Cursor’s built-in index; Serena’s tool schemas load every turn but bring retrieval to editors that have no index at all. Our Serena MCP complete guide covers that setup end to end — recipe 10 is the bridge between the two worlds.

Reading on Cursor’s index

The engineering behind the index is unusually well documented — by Cursor and by third parties who reverse-engineered it. These five are the sources this cookbook’s claims trace back to, and worth reading in this order.

Gotchas (the four that bite)

Sourced from the skill’s troubleshooting table and the threads above. You’ll hit at least one of these in your first week.

.gitignore'd files are already excluded

Cursor auto-skips everything in .gitignore, so copying it into .cursorignore is pure noise. The rules that matter cover git-TRACKED junk: committed dist/ output, vendored deps, large fixtures. Audit first (recipe 1), then write rules for what's actually in the index.

.cursorignore's blast radius is all AI features

An entry in .cursorignore doesn't only leave the index — it becomes invisible to @Files too. If you over-ignore, the symptom is Cursor claiming a file doesn't exist when you reference it. Anything you still want to reach by hand belongs in .cursorindexingignore instead (recipe 3).

The index lags your refactor by design

Change detection re-indexes modified files roughly every 10 minutes via Merkle-tree diffing. Stale @Codebase answers right after a big rename are expected, not broken. Wait it out, or force it: Cmd+Shift+P > Cursor: Resync Index (recipe 6's escalation ladder).

No network, no index

Embedding computation happens through Cursor's API and the vectors live in Turbopuffer — there is no offline mode. Air-gapped machines never get @Codebase, full stop. For that constraint, the LSP-backed route in recipe 10 (Serena's symbol graph) is the workaround, since language servers run locally.

Pairs well with

Curated from the cookbook’s actual pairings: the Cursor operations skills that extend this one (cursor-performance-tuning, cursor-indexing-issues, cursor-privacy-settings), the context-budget skills the ignore-file recipes feed (cursor-context-management, context-optimizer), and the retrieval servers from recipes 9 and 10. For the full client picture, the Cursor client page lists every MCP server that runs inside the editor this skill tunes.

Two posts that compose well with this cookbook: the Serena MCP complete guide is the semantic-retrieval sibling for editors without a built-in index, and Claude Code best practices covers the workflow habits that make skills like this one pay off.

Frequently asked questions

What does the cursor-codebase-indexing skill for Claude actually do?

It turns Claude Code into the operator for Cursor's indexing system. Ask it to set up indexing and it audits what's in the index, writes .cursorignore and .cursorindexingignore rules, scopes monorepos, fixes stale or stuck indexes (resync, cache deletion, the Linux inotify limit), and drafts @Codebase query patterns. It configures and troubleshoots the index — Cursor's own indexer still does the embedding work.

How does Cursor codebase indexing work under the hood?

Four stages: your files are split into syntactic chunks, each chunk becomes an embedding vector, the vectors land in Turbopuffer (a cloud vector store), and an @Codebase question is embedded and matched by nearest-neighbor search. A Merkle tree detects changes so only modified files re-index, roughly every 10 minutes. Small projects index in seconds; 50K+ file repos can take hours on the first pass.

Is Cursor codebase indexing safe for proprietary code?

Cursor's stated design: code is not stored server-side in plaintext — only embeddings plus obfuscated metadata — and with Privacy Mode on, embeddings are computed with zero data retention at the provider. What it can't do is run offline: indexing needs network access to the embedding API, so air-gapped environments get no @Codebase. Recipe 8 turns these facts into the checklist your security team will ask for.

What's the difference between .cursorignore and .cursorindexingignore?

Blast radius. .cursorignore hides files from indexing AND every AI feature — an ignored file can't even be pulled in with @Files. .cursorindexingignore only keeps files out of the index; you can still reference them explicitly. Secrets and build output belong in .cursorignore; big fixtures you occasionally inspect belong in .cursorindexingignore. Cursor also auto-excludes everything in .gitignore, so most rules are only needed for git-tracked files.

Can I get Cursor-style codebase indexing in Claude Code?

Not as embeddings — Claude Code ships grep-based search, no vector index. The closest equivalent is the Serena MCP server, which gives Claude semantic, symbol-level retrieval backed by a language server: find a symbol, find its references, no index maintenance. Recipe 10 maps each Cursor feature to its Claude Code counterpart, and our Serena deep-dive covers the setup end to end.

Why does @Codebase return no results or stale results?

No results usually means the index isn't built yet (wait for "Indexed" in the status bar) or the file is excluded via .gitignore or .cursorignore. Stale results mean the index hasn't caught up with a refactor — change detection runs about every 10 minutes. The escalation ladder: wait, then Cmd+Shift+P > Cursor: Resync Index, then delete the local cache and restart for a full re-index.

Does @Codebase replace grep and @Files?

No — it's the discovery tool, not the precision tool. @Codebase matches meaning, so it finds code you can't name yet, at a high context cost. Once you know the file, @Files is cheaper and exact; once you know the string, editor search is free. The working pattern from recipe 5: @Codebase to find, @Files to work, grep when you can already spell it.

Sources

Primary

Community

Critical and contrarian

Internal

Keep reading