CLAUDE.md vs AGENTS.md vs Skills — what the evals actually say
Three formats now compete to tell a coding agent how your project works: CLAUDE.md (Claude Code’s instructions file), AGENTS.md (the open cross-tool convention read by Codex, Cursor, and dozens more), and Skills (Anthropic’s SKILL.md capability packs). In early 2026 the evals started landing — and they don’t agree. Vercel says AGENTS.md beat Skills. A university study says context files can lower success rates. This is the neutral read of the evidence, plus the one thing every source agrees on.

On this page · 12 sections▾
TL;DR
- They aren’t rivals so much as layers. CLAUDE.md and AGENTS.md are the same kind of artifact — an always-on markdown instructions file — read by different tools. Skills are a different mechanism entirely: triggered, progressively-disclosed capability packs.
- Vercel’s eval favored AGENTS.md over Skills for teaching framework knowledge: “A compressed 8KB docs index embedded directly in AGENTS.md achieved a 100% pass rate, while skills maxed out at 79%.” Their explanation: in 56% of cases the Skill was never even invoked.
- A university study found the opposite can happen. ETH Zurich researchers’ “Evaluating AGENTS.md” (Feb 2026) reports context files “tend to reduce task success rates” and raise inference cost “by over 20%” on their benchmark.
- The reconciling variable is quality and length. Augment Code’s framing: “A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all.”
- Claude Code reads CLAUDE.md, not AGENTS.md. The documented fix is one line: a CLAUDE.md that imports AGENTS.md.
If you want the definitional differences between Skills, subagents, plugins, and hooks rather than the format debate, read Skills vs Subagents vs Plugins vs Hooks and the Skills vs MCP vs Subagents vs CLI decision matrix. For how to write a good CLAUDE.md, see the annotated CLAUDE.md walkthrough. This post is about which format performs best, and why the answer is contested.
What each format actually is
The three are constantly conflated, so the cleanest starting point is one sentence each, by mechanism rather than by vendor.
CLAUDE.md
Claude Code’s project-instructions file. Per Anthropic’s docs, it’s “loaded into the context window at the start of every session.” You write it; Claude reads it every time. Lives at ./CLAUDE.md (project), ~/.claude/CLAUDE.md (user), or a managed policy path (org).
AGENTS.md
The open, cross-tool version of the same idea — “a simple, open format for guiding coding agents,” per the project README. The site describes it as “a README for agents.” Same always-on instruction-file mechanism, but read by Codex, Cursor, Aider, Gemini CLI, and the wider ecosystem rather than one tool.
Skills (SKILL.md)
A different mechanism: folder-based capability packs that use progressive disclosure. Only the name + description (~100 tokens) load at startup; the full SKILL.md body and any bundled scripts load only when the agent decides the Skill is relevant.
That last distinction is the crux of the whole debate. CLAUDE.md and AGENTS.md are passive context — always present, no decision required. Skills are active capabilities — the agent has to recognize the moment and pull them in. Both designs are deliberate. Always-on context guarantees the instruction is there; progressive disclosure keeps a large library from filling the context window. Whether always-on or on-demand wins is exactly what the evals are fighting about.
What the evals say — and where they disagree
Three serious data points landed between January and February 2026. Two are vendor evals (Vercel, Augment Code); one is an academic study. They point in different directions, and the disagreement is the interesting part — not a reason to dismiss any of them.
Vercel: AGENTS.md outperformed Skills
Vercel’s Jude Gao published “AGENTS.md outperforms skills in our agent evals” on January 27, 2026. The test: teach a coding agent enough Next.js framework knowledge to pass a build/lint/test gate, and compare delivery via a Skill versus an embedded docs index in AGENTS.md. The headline:
“A compressed 8KB docs index embedded directly in AGENTS.md achieved a 100% pass rate, while skills maxed out at 79%.”
The mechanism Vercel proposes is the passive-vs-active split above. A Skill has to be chosen. From the post: “In 56% of eval cases, the skill was never invoked” — the agent had the Skill available, could have used it, and simply didn’t reach for it. Their summary of why AGENTS.md sidesteps that: “With AGENTS.md, there’s no moment where the agent must decide ‘should I look this up?’” Because the docs index sits in context from the first token, the retrieval decision never has to be made.
The result is real, but read the scope carefully: this is one framework-knowledge task, one configuration of Skill, and the clever bit is the 8KB compressed docs index, not just “AGENTS.md.” The finding is “always-on docs beat an un-triggered Skill for this kind of knowledge,” which is narrower and more useful than “AGENTS.md is better than Skills.” This is the same family of argument as the “just put it in context” school we covered in the MCP context-bloat piece — applied to docs instead of tools.
Augment: the variable is quality, not format
Augment Code’s post “A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all” reframes the whole question. Run on their internal AuggieBench suite (comparing agent output to “golden PRs” that passed senior-engineer review), their finding is that the file matters more than the format:
“A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all.”
The shape of the curve is the takeaway. Augment reports that well-built files in the 100–150 line range delivered double-digit improvements across their metrics in mid-size modules, with the best files giving “a quality jump equivalent to upgrading from Haiku to Opus.” But the gains reverse past that length: once the main file grew beyond ~150 lines, the improvements started shrinking, and the worst files — especially against modules already buried in 500K+ characters of surrounding docs — produced output worse than having no AGENTS.md at all.
Notice Augment’s own advice mirrors how Skills work: cover the common case at a high level, push detail into separate reference files. That is progressive disclosure by hand. The two camps converge more than the headlines suggest.
The ETH Zurich counter-result
The sharpest counterweight is academic. In “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” (arXiv:2602.11988, submitted February 12, 2026), Thibaud Gloaguen, Niels Mündler and co-authors test context files across multiple agents and models — Claude Code, Codex, Qwen Code — on SWE-bench-style tasks and on real repositories that ship their own committed context files. The abstract’s core sentence:
“Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%.”
Their behavioral explanation is worth holding onto: context files “encourage broader exploration” — more testing, more file traversal — and agents “tend to respect their instructions.” That obedience is the trap. If the file asks for work the task doesn’t need, a well-behaved agent does that work anyway, spending tokens and inviting mistakes. Their conclusion is explicit: “unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.”
This does not contradict Vercel or Augment so much as bound them. Vercel’s win came from a compressed, relevant docs index. Augment’s win came from short, focused files and reversed when they grew. The ETH Zurich result is what happens when context files are generated or written without that discipline. Every source, read together, says the same thing: length and relevance are the levers, and the default failure mode is too much.
The Claude Code friction (and the one-line fix)
There is a practical wrinkle underneath the format debate: Claude Code does not read AGENTS.md natively. Anthropic’s own memory documentation states it directly — “Claude Code reads CLAUDE.md, not AGENTS.md” — which is awkward if your repo already standardized on AGENTS.md for Codex, Cursor, and the rest. This is the source of the recurring community ask for AGENTS.md support and the symlink hacks people trade around it.
The documented fix is one line. Keep AGENTS.md as the single source of truth and import it from CLAUDE.md:
# CLAUDE.md
@AGENTS.md
## Claude Code
Use plan mode for changes under `src/billing/`.Claude Code expands the @AGENTS.md import at session start, then appends any Claude-specific instructions below it — so both tool families read the same file and nothing drifts. If you don’t need Claude-specific additions, a symlink works too:
ln -s AGENTS.md CLAUDE.mdOn Windows, where symlinks need elevated privileges, the @AGENTS.md import is the cleaner path. And if you run /init in a repo that already has an AGENTS.md, Claude Code reads it and folds the relevant parts into the generated CLAUDE.md — it also reads other tool configs like .cursorrules and .windsurfrules. The friction is real, but it’s a one-line workaround, not a blocker.
They’re not mutually exclusive — they compose
The framing as a three-way fight is misleading. In a real repo, all three coexist, each carrying a different kind of knowledge.
- One AGENTS.md as the canonical instructions file. Build/test commands, conventions, architecture, “always do X” rules — the context every task needs. Keep it under ~150 lines. CLAUDE.md imports it so Claude Code reads the same thing.
- Skills for task-specific capabilities. Anything that is a multi-step procedure, only matters for part of the codebase, or bundles scripts and reference files. These shouldn’t live in the always-on file — that’s the bloat the ETH Zurich study penalizes. A Skill costs ~100 tokens until it’s needed.
- Path-scoped rules for the middle ground. Claude Code’s
.claude/rules/with apathsfrontmatter loads instructions only when the agent touches matching files — narrower than always-on, lighter than a full Skill.
The right mental model: AGENTS.md/CLAUDE.md answer “what should you always know about this project,” Skills answer “what should you know how to do when this specific task comes up.” The mistake that all three eval sources punish is cramming the second category into the first.
Which to use when
| If you want to... | Use | Why |
|---|---|---|
| Give every task the same project context (commands, conventions, layout) | AGENTS.md (imported by CLAUDE.md) | Always-on context — no retrieval decision; works across Codex, Cursor, Claude Code at once |
| Teach a coding agent framework/library docs it keeps getting wrong | Compressed docs index in AGENTS.md | Vercel's eval: always-present index beat an un-triggered Skill (100% vs 79%) |
| Package a multi-step procedure or capability used occasionally | Skill (SKILL.md) | Progressive disclosure — ~100 tokens until triggered, full body on demand |
| Scope instructions to one part of the codebase | Path-scoped rules (.claude/rules/) | Loads only when the agent touches matching files; keeps the always-on file short |
| Use both AGENTS.md (for other tools) and Claude Code | CLAUDE.md with @AGENTS.md import | One source of truth; both tool families read it; no drift |
Two heuristics cut through the noise. Keep the always-on file short and relevant — every source agrees length is the dominant failure mode; aim for Anthropic’s “under 200 lines” and Augment’s 100–150 sweet spot. Move anything occasional or procedural into a Skill so it isn’t taxing every session. Do those two things and the format you nominally “chose” matters far less than the discipline you applied.
Community signal
Practitioners have been asking for a clean line between these artifacts for months — the most common version of the question is some form of “what is the real difference between Hooks, Skills, Plugins, SKILL.md, CLAUDE.md and agents.md?” on the Claude Code subreddit. The honest answer is the one this post lands on: CLAUDE.md and AGENTS.md are the same kind of always-on instructions file read by different tools, and Skills are a separate progressive-disclosure mechanism for capabilities.
“With AGENTS.md, there's no moment where the agent must decide 'should I look this up?'”
Jude Gao, Vercel · Blog
Vercel's agent evals, Jan 27 2026 — the passive-context argument for always-on docs over a Skill that has to be triggered.
“A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all.”
Augment Code · Blog
Augment's AuggieBench writeup — the reframe that file quality and length, not the format label, drive the result.
“Unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.”
Gloaguen, Mündler et al. (arXiv:2602.11988) · Blog
The ETH Zurich study's conclusion — context files reduced success rates and raised cost on their benchmark when they over-specified.
Frequently asked questions
Does Claude Code read AGENTS.md?
Not natively. Anthropic's own documentation states it plainly: "Claude Code reads CLAUDE.md, not AGENTS.md." The recommended fix is to keep your AGENTS.md as the source of truth and create a CLAUDE.md that imports it with `@AGENTS.md`, so both Claude Code and cross-tool agents read the same instructions. A symlink (`ln -s AGENTS.md CLAUDE.md`) also works if you don't need Claude-specific additions. Running `/init` in a repo that already has an AGENTS.md reads it and folds the relevant parts into the generated CLAUDE.md.
AGENTS.md vs CLAUDE.md — which is better?
They are the same shape of artifact (a markdown instructions file loaded into context at session start) read by different tools. CLAUDE.md is Claude Code's file; AGENTS.md is the open cross-tool convention read by Codex, Cursor, Aider, Gemini CLI, Jules, and others. The practical answer for most teams is not to choose: write one AGENTS.md and have CLAUDE.md import it, so you maintain a single file instead of two that drift apart.
Are Skills better than AGENTS.md?
It depends on what you are teaching the agent. Vercel's January 2026 agent evals found that embedding a compressed docs index directly in AGENTS.md hit a 100% pass rate while Skills maxed out at 79% — because the AGENTS.md context is always present, whereas Skills must be triggered, and in 56% of their eval cases the Skill was never invoked. But Skills win when the knowledge is a large, task-specific capability pack you don't want loaded into every session: Skill metadata costs roughly 100 tokens until triggered, then loads the full body on demand (progressive disclosure). Always-on instructions favor AGENTS.md; heavyweight, occasional capabilities favor Skills.
Can I use AGENTS.md and CLAUDE.md together?
Yes — that is Anthropic's documented recommendation. Put your shared instructions in AGENTS.md, then create a CLAUDE.md whose first line is `@AGENTS.md` and add Claude-specific instructions below it. Claude Code loads the imported file at session start, then appends the rest. This keeps one canonical instructions file for every coding agent while still letting you give Claude Code its own notes.
What is the difference between AGENTS.md and SKILL.md?
AGENTS.md is a single markdown file of always-on project instructions, loaded in full into context at session start. SKILL.md is the entry file of a Skill — a folder-based capability pack using progressive disclosure: only its name and description (about 100 tokens) load at startup, and the full body plus any bundled scripts and reference files load only when the agent decides the Skill is relevant. AGENTS.md is for the context every task needs; Skills are for procedures specific tasks need.
Do AGENTS.md context files actually improve coding agent performance?
The evidence is mixed, which is the whole debate. Vercel and Augment Code report large gains from well-built files. But a February 2026 study from ETH Zurich researchers ("Evaluating AGENTS.md", arXiv:2602.11988) found the opposite on their benchmark: across multiple agents and models, context files "tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%." The reconciling lesson is that file quality and length dominate — bloated or over-prescriptive files hurt, focused minimal ones help.
Which AI tools support AGENTS.md?
The agents.md site lists more than 60,000 open-source projects using the format and names OpenAI's Codex, Google's Jules and Gemini CLI, Cursor, Factory, Aider, GitHub Copilot's coding agent, VS Code, JetBrains' Junie, Cognition's Devin, and Windsurf among supporting tools. The format is now stewarded by the Agentic AI Foundation under the Linux Foundation. Claude Code is the notable holdout that reads CLAUDE.md instead, which is why the import/symlink pattern exists.
How long should an AGENTS.md or CLAUDE.md file be?
Short. Anthropic's docs target under 200 lines per CLAUDE.md; Augment Code's evals found 100–150 line files were the top performers, with gains reversing once files grew past that. The ETH Zurich paper concludes that "unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements." If your instructions are growing, push detail into Skills or path-scoped rules rather than one long file.
Sources
Evals & studies
- vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals — Jude Gao, Jan 27 2026. The 100% vs 79% pass-rate result, the “56% never invoked” figure, the passive-context argument.
- augmentcode.com — A good AGENTS.md is a model upgrade — AuggieBench evals; the 100–150 line sweet spot, the Haiku-to-Opus framing, the reversal past ~150 lines.
- arXiv:2602.11988 — Evaluating AGENTS.md — Gloaguen, Mündler et al., Feb 12 2026. Context files “tend to reduce task success rates” and raise inference cost “by over 20%” on their benchmark.
Primary references
- agents.md — the open format, “a README for agents,” 60,000+ projects, supporting-tool list, Linux Foundation stewardship.
- github.com/openai/agents.md — “a simple, open format for guiding coding agents” (README).
- code.claude.com/docs/en/memory — CLAUDE.md behavior, the “reads CLAUDE.md, not AGENTS.md” statement, the
@AGENTS.mdimport and symlink patterns, the under-200-line guidance. - platform.claude.com — Agent Skills overview — progressive disclosure, the ~100-tokens-metadata claim, the three loading levels.
Related on mcp.directory