Cursor Codebase Indexing skill: 10 setup &amp; tuning recipes

cursor-rules-config

Sets up .cursor/rules alongside the ignore files in one pass.

cursor-mcp-installer

Wires MCP servers into the same Cursor workspace you just audited.

Write a .cursorignore that cuts the index to signal

One file in the project root decides what Cursor's AI can see. The skill generates it in .gitignore syntax: build artifacts, dependencies, minified bundles, data dumps, and secrets all out.

ForTeams whose @Codebase results keep surfacing dist/ output instead of source.

The prompt

Use the cursor-codebase-indexing skill. Generate a .cursorignore for this repo (Next.js app, Python scripts/ directory, Playwright e2e/). Cover four groups with comments: build artifacts, dependencies, generated files (minified JS, source maps, lockfiles), and large data files. Add a defense-in-depth section for secrets — .env variants and any credentials directories — even though they're gitignored. Only include rules for paths that actually exist or plausibly will; don't paste a generic template.

What slides.md looks like

# .cursorignore
# Build artifacts
.next/
dist/
out/

# Generated files
*.min.js
*.map
*.lock

# Large data files
*.csv
*.sqlite
fixtures/

# Secrets — defense in depth (also in .gitignore)
.env*
**/credentials/

One-line tweak

Remember the blast radius: .cursorignore hides files from ALL AI features, not only the index — don't ignore anything you still want to @Files.

Pairs with

cursor-performance-tuning

The broader latency pass once the index itself is lean.

filesystem

Lets any MCP client apply the same exclusion list to its file access.

Split ignore duty with .cursorindexingignore

Some files should stay out of the index but remain reachable when you point at them. .cursorindexingignore excludes from indexing only — @Files still works on those paths.

ForAnyone with big test fixtures or generated docs they occasionally need to reference explicitly.

The prompt

Use the cursor-codebase-indexing skill. Our e2e/recordings/ and tests/fixtures/ directories are huge and pollute @Codebase results, but I still open individual fixtures with @Files while debugging. Set this up so they're excluded from the index but stay accessible to AI features. Explain in one comment block at the top of each file which ignore file does what, so the next person doesn't merge them back into one.

What slides.md looks like

# .cursorindexingignore
# Excluded from the INDEX only — @Files still works.
tests/fixtures/
e2e/recordings/
docs/.vitepress/dist/

# .cursorignore
# Hidden from indexing AND all AI features.
# Nothing here should ever need an @Files mention.
dist/
.env*

One-line tweak

When in doubt, start in .cursorindexingignore — it's the reversible choice; promote paths to .cursorignore only when the AI should never see them.

Pairs with

cursor-context-management

The full context-budget strategy this two-file split feeds into.

ripgrep

Exact-match search still reaches excluded paths when you need it.

Scope a monorepo so indexing finishes today

A 50K-file monorepo can take hours to index from the root. Open the package you're working on instead, or exclude sibling packages at the root — the skill writes both setups.

ForMonorepo developers whose first index never seems to complete.

The prompt

Use the cursor-codebase-indexing skill. Our monorepo has packages/web, packages/mobile, packages/admin, packages/api, and packages/shared. I work in api and shared. Give me both strategies: (a) the cursor command to open only the package directory, and (b) a root-level .cursorignore that excludes the packages I don't touch while keeping api and shared indexed. Tell me which one you'd pick for my case and why, in two sentences.

What slides.md looks like

# (a) Open the package, not the monorepo:
cursor /path/to/monorepo              # indexes everything — slow
cursor /path/to/monorepo/packages/api # this package only — fast

# (b) Root .cursorignore — keep root open, scope the index:
packages/web/
packages/mobile/
packages/admin/
# packages/api/    ← not listed → indexed
# packages/shared/ ← not listed → indexed

One-line tweak

Pick (b) when api imports from shared — cross-package @Codebase answers need both in one index.

Pairs with

cursor-multi-repo

Workflows for when the work genuinely spans several repos.

turborepo

Monorepo task-graph hygiene pairs naturally with index scoping.

Ask @Codebase questions that actually hit

@Codebase is nearest-neighbor search over embeddings — it matches meaning, not keywords. The skill teaches the query shapes that exploit that, and when to hand off to @Files or plain grep.

ForDevelopers who type keyword-style queries into @Codebase and get noise back.

The prompt

Use the cursor-codebase-indexing skill. I'm new to this codebase and need to understand how payments work. Give me five @Codebase queries phrased for semantic search — questions about behavior and flow, not symbol names — covering: where payment processing starts, how failures are retried, where we talk to the payment provider, how refunds differ from charges, and what gets logged. Then show the handoff: once @Codebase finds the file, what do I switch to for the actual edit and why.

What slides.md looks like

@Codebase how does the payment processing flow work?
@Codebase where do we retry failed payment attempts?
@Codebase find all places where we call the payment provider API
@Codebase how is a refund handled differently from a charge?

# Discovery → precision handoff:
# 1. @Codebase surfaces src/billing/charge.ts
# 2. switch to @Files src/billing/charge.ts to edit
# @Codebase = high context cost — use it to FIND
# @Files    = low context cost  — use it to WORK

One-line tweak

If you already know the exact string ('PaymentIntentFailed'), skip embeddings entirely — Ctrl+Shift+F is free and exact.

Pairs with

cursor-ai-chat

Chat workflows that consume what @Codebase discovery surfaces.

cursor-composer-workflows

Multi-file edits once discovery has mapped the territory.

Un-stick a stale or stuck index

Post-refactor, @Codebase keeps returning the old file layout. The skill runs the escalation ladder: wait out the re-index window, force a resync, then nuke the local cache.

ForAnyone whose search results reference files that were renamed or deleted last week.

The prompt

Use the cursor-codebase-indexing skill. We renamed src/utils/ to src/lib/ two days ago and @Codebase still answers with the old paths. Give me the fix as an escalation ladder: what the normal re-index latency is and why (change detection), the command-palette resync, and the local cache deletion path for macOS as the last resort. For each rung, tell me how to confirm it worked before escalating.

What slides.md looks like

# Rung 1 — wait: changed files re-index ~every 10 min
#   (Merkle-tree change detection, modified files only)

# Rung 2 — force it:
Cmd+Shift+P > Cursor: Resync Index
#   confirm: status bar shows indexing progress

# Rung 3 — nuke the local cache, then restart Cursor:
macOS:   ~/Library/Application Support/Cursor/Cache/
Linux:   ~/.config/Cursor/Cache/
Windows: %APPDATA%\Cursor\Cache\

One-line tweak

High CPU right after rung 3 is the full re-embedding pass, not a bug — it subsides when the status bar flips to Indexed.

Pairs with

cursor-indexing-issues

The dedicated symptom-by-symptom indexing troubleshooter.

cursor-chat-history

Recovers past Cursor conversations when debugging what changed.

Fix Linux file-watcher exhaustion on large repos

On Linux, big projects silently hit the inotify watch limit and indexing stalls without an obvious error. Two sysctl commands fix it; the skill knows both.

ForLinux developers whose index sticks at N% on repos that work fine for macOS teammates.

The prompt

Use the cursor-codebase-indexing skill. Indexing on my Ubuntu machine stalls partway through a ~80K-file repo; the same repo indexes fine on a colleague's Mac. Check whether I'm hitting the inotify file-watcher limit: show me how to read the current limit, raise it temporarily to test, and make the change permanent if it fixes the stall. Note what value you're setting and why that number.

What slides.md looks like

# Check the current limit
cat /proc/sys/fs/inotify/max_user_watches

# Raise it (temporary — gone after reboot)
sudo sysctl fs.inotify.max_user_watches=524288

# Make it permanent
echo "fs.inotify.max_user_watches=524288" | \
  sudo tee -a /etc/sysctl.conf
sudo sysctl -p

One-line tweak

Pair this with use case 2 first — a lean .cursorignore may keep you under the default limit without touching sysctl at all.

Pairs with

cursor-common-errors

Catalogs the other silent failures that present like this one.

Run a privacy and compliance pass before team rollout

Indexing sends embeddings to cloud storage. Before a security team signs off, someone has to answer where code goes, what's retained, and what breaks offline. The skill drafts that answer.

ForTech leads who need indexing approved by security before the team can use @Codebase.

The prompt

Use the cursor-codebase-indexing skill. Draft the privacy assessment our security team will ask for before we enable codebase indexing org-wide: where embeddings are stored, whether plaintext code is retained server-side, what metadata exists, what Privacy Mode changes, and what happens for the one repo that lives on an air-gapped subnet. Format it as a checklist with a verdict line per item, and flag anything that needs an exception rather than an approval.

What slides.md looks like

PRIVACY ASSESSMENT — Cursor codebase indexing
[x] Plaintext code stored server-side: NO — embeddings
    + obfuscated metadata only
[x] Vector storage: Turbopuffer (cloud-hosted)
[x] Privacy Mode ON: zero data retention at provider
[!] Air-gapped repo: indexing needs network access to
    the embedding API — UNAVAILABLE offline → exception
[x] .cursorignore covers .env*, **/secrets/ (recipe 2)
→ verdict: approve org-wide; exempt air-gapped repo

One-line tweak

Re-run the assessment when Cursor ships indexing changes — the answers here are version-dependent, and your security team will ask.

Pairs with

cursor-privacy-settings

Configures the Privacy Mode toggles this assessment depends on.

cursor-compliance-audit

Extends the pass to the rest of the Cursor deployment.

Index your private docs on purpose

The index isn't only for code. Scrape internal docs to Markdown, drop them in a context/ folder, and let @Codebase answer questions from documentation Cursor would otherwise never see.

ForTeams whose critical knowledge lives in private docs no public docs-lookup tool can reach.

The prompt

Use the cursor-codebase-indexing skill. I have internal API docs and two vendored-library references scraped to Markdown. Set up a context/ folder that is excluded from git but INCLUDED in Cursor's index — I want @Codebase to answer questions from these docs, but they don't belong in version control. Show me the .gitignore and .cursorignore entries that produce that exact split, and explain why the re-include rule is needed given that Cursor skips gitignored files by default.

What slides.md looks like

# .gitignore — keep scraped docs out of the repo
/context/

# .cursorignore — re-include them for the index
# (Cursor auto-excludes gitignored paths; the negation
#  rule opts this folder back in)
!/context/

context/
├── internal-billing-api.md
├── vendor-sdk-reference.md
└── pg-partitioning-notes.md

@Codebase how do we handle billing webhook retries?

One-line tweak

This is the trick an HN user used to index entire transcribed textbooks — the index doesn't care whether Markdown came from your repo or a PDF.

Pairs with

context-optimizer

Keeps the docs corpus from blowing the context budget downstream.

qdrant

The DIY route: your own vector store when you outgrow this trick.

Recreate the workflow in Claude Code

Claude Code has no embedding index — and mostly doesn't need one. The skill maps each Cursor indexing feature to its Claude Code equivalent, with Serena covering the semantic gap.

ForDevelopers running both editors who want one mental model for code search in each.

The prompt

Use the cursor-codebase-indexing skill as reference. I use Cursor and Claude Code on the same repo. Build me the translation table: for @Codebase semantic search, @Files, @Folders, and exact text search, what is the Claude Code equivalent, and where does the Serena MCP server fit? Be honest about the architectural difference — embeddings versus a language-server symbol graph — and when each approach finds things the other misses.

What slides.md looks like

Cursor                    → Claude Code equivalent
@Codebase (embeddings)    → Serena MCP: symbol search +
                            references, LSP-backed
@Files file mention       → @path/to/file in the prompt
@Folders                  → directory mention / Glob
Ctrl+Shift+F exact match  → Grep tool (ripgrep)
index maintenance         → none — Serena queries the
                            language server on demand

# embeddings match MEANING; the symbol graph
# matches STRUCTURE. Different misses, both useful.

One-line tweak

Concept queries ('where do we throttle?') favor embeddings; structural queries ('who calls this?') favor the symbol graph — route accordingly.

Pairs with

serena

Semantic, LSP-backed code retrieval for Claude Code and any MCP client.

mcp-serena

The skill-side wrapper for driving Serena's tools well.

Community signal

Three voices from people running the index on real codebases: the at-scale endorsement, the creative-use ceiling, and the honest middle of the grep-versus-embeddings debate. All three are verbatim from Hacker News.

“AI can search your code today. In cursor this is called “codebase indexing”. We have some million(s) lines of code, orders of magnitude smaller than Facebook, but definitely larger than the average startup. We search with AI tools, through Q&A, and for AI-driven code mods.”

jitl (HN) · Hacker News

The at-scale endorsement: an engineer on a multi-million-line codebase using the index for Q&A and code mods, not autocomplete party tricks.

“I scrape the docs to Markdown, stick them into a “context” folder, and use Cursor's vector codebase indexing. This allows the agent to literally ask questions like “how do I do ABC with library XYZ?” and the vector database delivers a chunked answer from all available documentation.”

electroly (HN) · Hacker News

The creative-use ceiling — recipe 9 is this comment turned into a setup. The same thread includes the .gitignore + !/context/ re-include trick.

“For various reasons (RL, inherent structure of code) iterative grepping is unreasonably effective. Interestingly Cursor does use embedding vectors for codebase indexing... Seems like sometimes Cursor has a better understanding of the vibe of my codebase than Claude code, maybe this is part of it.”

mips_avatar (HN) · Hacker News

The honest middle position in the grep-vs-embeddings debate: grep is unreasonably effective, and the index still catches things grep can't phrase.

The contrarian take

Not everyone is sold. The most useful critique comes from ramoz (HN), who never understood what the index was doing for them:

“I kept wondering why Cursor was indexing my codebase, it was never clear. Anyway context to me enables a lot more assurance and guarantees. RAG never did.”

ramoz (HN) · Hacker News

From an HN thread on deterministic context assembly versus retrieval.