Why Your Claude Skill Won't Activate (2026)

On this page · 12 sections▾

TL;DR + the 73% problem
How activation actually works
The 8 reasons skills don't fire
5 description rewrites (before/after)
Debugging activation
Opus vs Sonnet vs Haiku
The SKILL.md spec, exactly
Cross-agent: Cursor, Codex, Cline
A reusable test harness
FAQ
Related reading
Sources

TL;DR + the activation reliability problem

Skills are a discoverable runtime extension, not an always-on tool. At startup, Claude pre-loads only the YAML frontmatter — name and description — for every installed skill. Anthropic’s engineering blog spells it out: “At startup, the agent pre-loads the name and description of every installed skill into its system prompt. This metadata is the first level of progressive disclosure: it provides just enough information for Claude to know when each skill should be used.” That description is the entire trigger surface. Get it right and the skill fires; get it wrong and the rest of your work — the 500-line SKILL.md, the helper scripts, the carefully authored examples — never enters the context window.

The empirical headline number. Ivan Seleznov’s February 2026 Medium post documents a factorial replication study with three skills, three description variants, four environment conditions, eighteen queries, and three repetitions per cell — 650 automated trials in total, verified against ground-truth invocation logs via cclogviewer. Variant C (a directive description starting with “ALWAYS invoke this skill when…”) achieved 100% activation in bare conditions. Variant A (the typical passive description most skills ship with) dropped to 37% when hooks were added. The odds ratio between them was 20.6 (p<0.0001). That’s not a small effect — your skill quality, in activation terms, is dominated by one paragraph of frontmatter.

The community’s lived experience. Scott Spence, after instrumenting his own skill ecosystem, describes the failure mode bluntly: “4/10 globally, 5/10 locally. Basically a coin flip.” And on anthropics/claude-code issue #12679, a closed-as-duplicate ticket titled “Skills don’t automatically start,” the user’s frustrated quote captures the whole problem space: “Even claude can’t work out why it doesn’t automatically start any skills.” The fix isn’t a config toggle. It’s description craft, and it’s teachable.

What we’ll cover. The activation pipeline (what Claude actually reads, when, and at what cost), the eight reasons real skills silently fail, five before/after description rewrites for live skills on this site, how to debug with --verbose and SkillCompass, what differs between Opus / Sonnet / Haiku, the exact frontmatter spec from agentskills.io, cross-agent portability notes, and a reusable test harness you can run nightly. No fluff, no fabricated metrics, every number sourced.

How activation actually works

Skills run on progressive disclosure — Anthropic’s term, from the engineering blog. Three levels:

Level 1 — Metadata. The YAML frontmatter (name + description) loads at startup, ~100 tokens per skill, into the system prompt. This is what Claude sees when deciding whether your skill is relevant.
Level 2 — Instructions. The body of SKILL.md — Anthropic recommends keeping it under 5,000 tokens / 500 lines — loads only when Claude has decided the skill is relevant to the current task. The trigger is still the description; the body is reward, not bait.
Level 3+ — Resources. Linked files (references/REFERENCE.md, scripts/*.py, assets/*) load only when SKILL.md mentions them. They cost nothing in context until Claude chooses to read them.

Anthropic’s docs put it in a single sentence: “If Claude thinks the skill is relevant to the current task, it will load the skill by reading its full SKILL.md into context.” The verb “thinks” is doing all the work. Activation is a model judgement made on a slim metadata record, against a current task, in the presence of every other skill’s metadata. Five things shape that judgement:

The description text. Specificity, trigger keywords, point of view, directive verbs.
The skill name. Specifically whether it collides semantically with neighbours.
The current request shape. A user prompt that doesn’t use any of the skill’s trigger vocabulary will rarely fire it.
The model. Opus, Sonnet, Haiku each weigh metadata differently — Anthropic’s best-practices doc says outright, “What works perfectly for Opus might need more detail for Haiku.”
Conflicting context. If a more specific skill above yours covers the same domain, Claude will pick it. If a CLAUDE.md or hook already promised a workflow, that promise can override your skill.

Once you internalise that activation is description-matched, system-prompt-resident, model-judged inference, the failure modes fall out of the model on their own. We’ll work through them next.

The 8 reasons skills don’t fire

Catalogued from Seleznov’s log analysis, the anthropics/claude-code issue tracker, Scott Spence’s instrumentation posts, and the SkillCompass D2-failure histogram. In rough order of how often I’ve watched each one bite people on this directory.

1. Vague description

“Helps with PDFs.” The agentskills.io spec literally cites this as the canonical bad example. No domain anchor, no trigger vocabulary, nothing for Claude’s relevance judgement to grip. Variant A in Seleznov’s study was a tame version of this and dropped activation to 37% with hooks present.

2. Name collision

Two skills both named helpers, or both starting with doc-*, or both claiming “documentation” as their domain. Claude picks one and ignores the other and the choice is non-deterministic. The fix is gerund-form names from a single domain — Anthropic explicitly recommends “processing-pdfs, analyzing-spreadsheets, managing-databases”.

3. Missing trigger words

Your skill processes Excel files but the description never mentions “xlsx”, “spreadsheet”, “pivot table”, or “CSV”. Users type those words; Claude matches against them. The agentskills spec says descriptions “should include specific keywords that help agents identify relevant tasks.”

4. Wrong file location

SKILL.md must live at ~/.claude/skills/<name>/SKILL.md (personal), .claude/skills/<name>/SKILL.md (project), or be uploaded via API/zip. The directory name must match the name field exactly. Skills in any other location are silently invisible — they don’t even show up in available_skills.

5. Malformed frontmatter

Uppercase in name, leading hyphen, consecutive hyphens, the literal word “anthropic” or “claude” (reserved), an XML tag in the description, or description exceeding 1024 characters. Each one fails validation. The skill-creator-doctor skill exists specifically to repair these.

6. allowed-tools too narrow

The experimental allowed-tools field is a space-separated allowlist. If your skill needs Bash(git:*) but you wrote Bash(git:status), the skill nominally activates but every action it tries gets blocked. Claude often gives up rather than reporting the issue. Treat allowed-tools as a hint and test it on every model surface.

7. Model / surface mismatch

Your skill works on Opus in Claude Code and silently fails on Sonnet via the API. Likely cause: a description that relies on subtle inference (Opus’s strength) without enough explicit trigger language for Sonnet. Or you’re using a runtime feature (network access, package install) that’s not available on the API surface — the docs spell out which is which.

8. Conflicting skill above it

A more specific skill in the same domain — or a hook that already promised a workflow — wins over yours. SkillCompass’s D6 (Uniqueness) dimension flags this as “model supersession risk.” The fix: explicit negative constraints (“Do not use for X”) and tighter domain scoping in the first sentence.

The good news: of those eight, five are description problems and the other three are filesystem hygiene that takes ten minutes to audit. The next section is where the description work happens.

Anatomy of a description that fires — 5 rewrites

Every “before” below is the live, current description from a skill page on this directory. Every “after” applies Seleznov’s directive template, the agentskills.io trigger-keyword guidance, and the Anthropic best-practices third-person rule. The point isn’t that the originals are bad — they’re what most skills ship with. It’s that the same skill can have a 60% activation rate or a 100% activation rate depending on this paragraph.

1. `godot` — already strong, tighten the close

Before (321 chars):

This skill should be used when working on Godot Engine projects.
It provides specialized knowledge of Godot's file formats (.gd,
.tscn, .tres), architecture patterns (component-based, signal-driven,
resource-based), common pitfalls, validation tools, code templates,
and CLI workflows.

After (387 chars):

Godot Engine expert. ALWAYS invoke this skill when the user mentions
Godot, .gd, .tscn, .tres, GDScript, or asks to scaffold a Godot 4
project, debug a scene tree, validate a .tres resource, or wire
signals between nodes. Do not write Godot code from training data
directly — use this skill first to load the current 4.x patterns,
validators, and CLI workflow.

What changed: opens with the domain identifier (“Godot Engine expert”), uses the imperative ALWAYS, lists six concrete trigger topics (file extensions + verbs), closes with a negative constraint that blocks the model’s default fallback. The original was descriptive; the rewrite is directive. Same skill page: /skills/godot.

One-line install · by bfollington

Open skill page

Install

mkdir -p .claude/skills/godot && curl -L -o skill.zip "https://mcp.directory/api/skills/download/235" && unzip -o skill.zip -d .claude/skills/godot && rm skill.zip

Installs to .claude/skills/godot

2. `pdf-to-markdown` — reorder for trigger-first

Before (482 chars):

Convert entire PDF documents to clean, structured Markdown for full
context loading. Use this skill when the user wants to extract ALL
text from a PDF into context (not grep/search), when discussing or
analyzing PDF content in full, when the user mentions "load the
whole PDF", "bring the PDF into context", "read the entire PDF",
or when partial extraction/grepping would miss important context.

After (498 chars):

PDF-to-Markdown extraction expert. ALWAYS invoke this skill when
the user mentions PDF, .pdf, "load the whole PDF", "bring the PDF
into context", "read the entire PDF", or asks to extract, convert,
or analyse full PDF documents into Markdown for RAG ingestion or
LLM context loading. Do not page-by-page grep PDFs directly — use
this skill first to load pymupdf4llm + docling and produce a clean,
structured conversion.

Same trigger keywords, but the directive opens with what the skill is, not what it does. SkillCompass D2 scoring weights the first sentence higher than later clauses; lead with the identity.

3. `slidev` — add trigger vocabulary

Before (211 chars):

Comprehensive guide for Slidev - a web-based presentation framework
for developers. Covers Markdown syntax, layouts, components,
animations, theming, and exporting. Use this skill when creating
or working with developer presentations using Slidev.

After (461 chars):

Slidev developer-presentation expert. ALWAYS invoke this skill when
the user mentions Slidev, slidev.dev, .md slides, presentation as
code, or asks to author, theme, animate, or export a developer
slide deck (Markdown-driven, Vue-component-aware). Triggers include
"build a slide deck", "presentation", "export to PDF", "code slides".
Do not write a stock reveal.js or PowerPoint deck — use this skill
first to scaffold a Slidev project with the current syntax.

The original assumed the user types “Slidev” explicitly. Most don’t — they say “build a slide deck” or “presentation”. Rewrite lists those phrases verbatim so the description matches user vocabulary, not author vocabulary. Skill page: /skills/slidev.

4. `shadcn-ui` — narrow the version, widen the triggers

Before (262 chars):

shadcn/ui component patterns for Next.js 16 applications. This
skill should be used when adding UI components, customizing
component styles, composing primitives, or integrating forms with
react-hook-form. Covers installation, customization, composition
patterns, and Atlas-specific conventions using Tailwind CSS v4.

After (548 chars):

shadcn/ui + Next.js 16 component expert. ALWAYS invoke this skill
when the user mentions shadcn, shadcn/ui, Radix primitive, or asks
to add, install, customise, theme, or compose UI components in a
Next.js 16 / React 19 / Tailwind v4 codebase. Triggers include
"add a button", "build a form with react-hook-form", "make a
data table", "theme my app", "design system component". Do not
hand-roll a Radix wrapper — use this skill first to install via
the shadcn CLI and follow Atlas conventions.

The original gated activation behind “adding UI components,” which is too abstract — users say “make a data table” or “build a form”. The rewrite enumerates those phrases. Skill page: /skills/shadcn-ui.

5. `using-superpowers` — the meta-skill

Before (215 chars):

Use when starting any conversation - establishes mandatory workflows
for finding and using skills, including using Skill tool before
announcing usage, following brainstorming before coding, and
creating TodoWrite todos for checklists

After (524 chars):

Skill-orchestration meta-expert. ALWAYS invoke this skill at the
start of every conversation, before any other skill or task.
Establishes the mandatory workflow: search available skills with
the Skill tool before announcing usage, run brainstorming before
coding, create a TodoWrite checklist for any multi-step task. Do
not start writing code or invoking other skills directly — use
this skill first to discover, plan, and track. Triggers: any new
conversation, any unfamiliar task, any time the user asks "what
can you do?".

Critical for meta-skills like this one: the trigger has to be “every new conversation” explicitly. Vague framing like “Use when starting any conversation” doesn’t match against a typical user turn — the user says “hello” or skips the greeting entirely and dives into the task. The rewrite spells outevery conversation, every unfamiliar task, every “what can you do?”. Skill page: /skills/using-superpowers.

“The blurbs can be improved if they aren't effective. The description is equivalent to your short term memory.”

seunosewa · Hacker News

On the 816-point HN thread for Anthropic's Skills launch — captures the canonical realisation that the description is the trigger, not the body.

Source

Debugging activation: --verbose, SkillCompass, eval loops

Three approaches, in increasing order of rigour. Pick the one that matches how much you trust your gut.

--verbose: confirm the skill is even visible

In Claude Code, run with the verbose flag and inspect the system-prompt section for available_skills. You’re looking for two things:

Is your skill in the list at all? If not, the problem is filesystem hygiene — directory name mismatches the name field, frontmatter validation failed, or you’re looking at the wrong scope (project vs personal). Check both~/.claude/skills/ and .claude/skills/.
Is the description what you wrote? Some skill bundles ship a generated description that overrides yours. If the description shown in available_skillsdoesn’t match your SKILL.md, your file isn’t the canonical source.

If both pass and the skill still doesn’t fire, the description is the problem. Move to SkillCompass.

SkillCompass: scored evaluation

Evol-ai/SkillCompass is a local-first skill evaluator (206 stars, MIT licensed, requires Claude Opus and Node 18+). It scores skills across six weighted dimensions:

ID	Dimension	Weight	What it measures
D1	Structure	10%	Frontmatter validity, markdown format, declarations
D2	Trigger	15%	Activation quality, rejection accuracy, discoverability
D3	Security	20%	Secrets, injection, permissions, exfiltration, embedded shell
D4	Functional	30%	Core quality, edge cases, output stability, error handling
D5	Comparative	15%	Value over direct prompting
D6	Uniqueness	10%	Overlap with similar skills, model supersession risk

The dimension that matters here is D2: Trigger — activation quality, rejection accuracy, discoverability. A skill scoring under 70/100 on D2 is the one you’re actually debugging. The CLI:

# Single skill
/eval-skill ~/.claude/skills/godot

# Directory-wide audit, worst-first
/eval-audit ~/.claude/skills --ci

# Closed loop: fix weakest dimension, re-evaluate, verify, repeat
/eval-evolve ~/.claude/skills/godot

In CI mode, exit codes mean: 0=PASS, 1=CAUTION, 2=FAIL. Wire it into your skill-bundle release pipeline and you’ll catch regressions before users do.

Manual eval loop: Seleznov methodology

The most rigorous debugging is a small replication of Seleznov’s factorial design. You don’t need 650 trials; 50 will tell you enough. Pick three description variants:

A — Passive (current). Whatever you have now.
B — Specific (current + trigger keywords). Same prose, append “Triggers include …” with 6–10 phrases users actually type.
C — Directive (Seleznov template). The “ALWAYS invoke this skill when… Do not X directly” shape.

Pick 18 representative queries. Run each query 3 times under each variant. Parse the agent transcript for the skill name. Compute hit rate per cell. Variant C wins by such a margin in the literature that anything else suggests an unrelated issue (filesystem, conflicting skill, model surface).

A note on cclogviewer, the tool Seleznov used for ground-truth invocation extraction: it reads Claude Code’s session logs and emits a clean stream of tool/skill events. Useful when the agent transcript is ambiguous about whether the skill genuinely fired or was just mentioned.

“Claude is so goal focused that it barrels ahead with what it thinks is the best approach. It doesn't check for tools unless explicitly told to.”

Scott Spence · Blog

Scott Spence's diagnosis from his own instrumentation work — the activation problem isn't a bug, it's a behavioural property of goal-directed inference.

Source

Opus vs Sonnet vs Haiku: how activation differs

Honesty bit first: there is no published activation-rate-by-model benchmark from Anthropic. What we have is the official guidance to test on every model — “Test your Skill with all the models you plan to use it with… What works perfectly for Opus might need more detail for Haiku” — and a body of community reports. Treat what follows as a conjectural taxonomy, not a measured benchmark.

Opus — the most forgiving model. Will often activate on a passive description if the request is unambiguous. This is the trap: skills authored against Opus quietly degrade when deployed to Sonnet because the author never noticed the description was weak.
Sonnet — the workhorse and the model that exposes description weakness most reliably. Wants directive language and explicit trigger keywords. If your activation tests pass on Sonnet, they’ll almost certainly pass on Opus.
Haiku — the strictest. Anthropic’s best-practices doc asks pointedly, “Does the Skill provide enough guidance?” Haiku activates well only when the description is short, directive, and packed with the exact keywords the user typed. It also benefits from more body content — Anthropic notes Opus needs you to avoid over-explaining; Haiku usually needs more detail than you’d give Opus.

Practical rule: if you’re shipping a skill to a multi-model audience, test on Sonnet first and fix until it activates 100% there. Opus will be fine. For Haiku, add 2–3 more concrete trigger phrases and slightly more body specificity, and re-test.

The SKILL.md spec, exactly

Pulled verbatim from agentskills.io/specification (the open spec, Apache 2.0 / CC-BY-4.0, the same one Anthropic publishes from at agentskills/agentskills, 17.8k stars). This is the canonical reference; the platform.claude.com docs reproduce a subset.

Field	Required	Constraints
name	Yes	1–64 chars; lowercase a-z, 0-9, hyphens; no leading/trailing/consecutive hyphens; must match parent directory name
description	Yes	1–1024 chars; non-empty; describes what the skill does and when to use it; should include trigger keywords
license	No	License name or reference to bundled license file
compatibility	No	Max 500 chars; environment requirements (product, packages, network)
metadata	No	Arbitrary string-key string-value map; recommended unique key names
allowed-tools	No	Space-separated string of pre-approved tools; experimental, not universally honored

A minimal valid SKILL.md, exactly:

---
name: pdf-processing
description: Extracts text and tables from PDF files, fills PDF
  forms, and merges multiple PDFs. Use when working with PDF
  documents or when the user mentions PDFs, forms, or document
  extraction.
---

# PDF Processing

## Quick start

Use pdfplumber for text extraction:

```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
    text = pdf.pages[0].extract_text()
```

For form filling, see [FORMS.md](FORMS.md).

That description (PDF skill) is the canonical Anthropic-blessed example, lifted from their docs. Note the shape: leads with what the skill does, includes “Use when”, lists three trigger nouns (“PDFs, forms, document extraction”). It’s a softer cousin of Seleznov’s directive variant — Anthropic recommends it as the floor; Seleznov shows you can push higher.

The directory layout the spec sanctions:

pdf-processing/
├── SKILL.md          # required: metadata + body
├── scripts/          # optional: executable code
│   ├── analyze.py
│   └── fill_form.py
├── references/       # optional: documentation
│   └── FORMS.md
└── assets/           # optional: templates, images, schemas
    └── invoice-template.json

Validate with skills-ref validate ./my-skill from the agentskills/agentskills repo. It checks every frontmatter rule above, plus the directory-name match.

One-line install · by ananddtyagi

Open skill page

Install

mkdir -p .claude/skills/skill-creator-doctor && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3878" && unzip -o skill.zip -d .claude/skills/skill-creator-doctor && rm skill.zip

Installs to .claude/skills/skill-creator-doctor

Cross-agent: same SKILL.md in Cursor, Codex, Cline

The agentskills.io spec is the open format. Most agents read it. The portability story has caveats, though, and pretending otherwise gets you bug reports.

Claude (Code, claude.ai, API) — the reference implementation. Description-matched at startup. All optional fields supported. allowed-tools experimental but functional.
Cursor — reads SKILL.md from the same filesystem layout. Description-matching shape is similar to Claude’s. allowed-tools is honoured variably; treat as a hint.
Codex CLI — supports skills in recent releases, but the activation model varies by version. Some releases require explicit invocation; others auto-discover. Check release notes.
Cline — description-matching agent. Works with the same SKILL.md the rest of the agents read. Best results with directive descriptions.
Gemini CLI — emerging support, less uniform. Test before you ship.

Portable rule. Write the description for the description-matching baseline (Claude, Cursor, Cline). Treat allowed-tools as documentation rather than enforcement. Avoid agent-specific instructions in SKILL.md — push them to optional reference files that the description doesn’t advertise as universal. If a skill has to behave differently per agent, ship two skills with explicit names.

A reusable activation-test harness

Pseudocode for spotting silent-fail skills, generalised from Seleznov’s methodology. Drop into a script, run nightly against your skill bundle, alert when activation drops below threshold.

# 1. Configuration
SKILL_DIR = "~/.claude/skills"
QUERIES = load_yaml("queries.yml")     # 18 representative prompts
                                        # per skill, 3 reps each
MODELS = ["claude-opus-4.7", "claude-sonnet-4.7", "claude-haiku-4.7"]
THRESHOLD = 0.85                        # fail below 85% activation

# 2. For each (skill, model, query, rep) cell:
results = []
for skill in list_skills(SKILL_DIR):
    for model in MODELS:
        for query in QUERIES[skill.slug]:
            for rep in range(3):
                session = start_session(model=model)
                send(session, query.text)
                events = parse_session_log(session)
                fired = any(e.skill == skill.name for e in events)
                results.append({
                    "skill": skill.slug,
                    "model": model,
                    "query": query.id,
                    "rep": rep,
                    "fired": fired,
                })

# 3. Compute hit rate per (skill, model)
import statistics
by_cell = group_by(results, ["skill", "model"])
for (skill, model), rows in by_cell.items():
    hit_rate = sum(r["fired"] for r in rows) / len(rows)
    if hit_rate < THRESHOLD:
        alert(f"{skill} on {model}: {hit_rate:.0%} (below {THRESHOLD:.0%})")

# 4. Optional: regression detection
# Compare hit_rate to last week's baseline. Alert on >10% drop.

Three implementation notes from running variants of this harness on the directory’s top 50 skills.

Parse logs, not transcripts. Asking Claude “did you use the godot skill?” yields hallucinated yes/no answers. Read the session log’s tool-use events directly.
Mind the temperature. Activation is partly stochastic. N=3 per cell is the bare minimum; N=5 if you can afford it.
Pin the model version. Activation rates drift across model releases. A skill that scored 95% six months ago might score 78% on the latest Sonnet — same skill, same harness, different judgement weights. Pin and re-baseline on every model bump.

“The capability of the system to use the right skill is limited by the little blurb you give about what the skill is for.”

Imnimo · Hacker News

The most-upvoted concern on Anthropic's Skills launch HN thread — the description-as-trigger mental model, in one sentence.

Source

Frequently asked questions

Why isn't my Claude skill activating?

Almost always the description. Claude only sees the SKILL.md frontmatter at startup — name and description, ~100 tokens per skill — and decides to load the body based on that text alone. If the description is vague, written in first person, missing trigger words, or buried under a more specific neighbour, the skill never fires. Anthropic's own best-practices doc says it plainly: "Pay special attention to the name and description of your skill. Claude will use these when deciding whether to trigger the skill in response to its current task."

What activation rate should I expect from a well-written skill?

Ivan Seleznov's 650-trial replication study (published February 2026) found 88.9% overall activation across all conditions, but with a 20× spread between description variants: directive descriptions hit 100% in bare conditions while passive descriptions dropped to 37% when hooks were added. SkillCompass, the open-source evaluator from Evol-ai, scores activation under its D2 (Trigger) dimension at 15% of the overall skill score and flags anything under ~70/100 as a reliability risk.

What does a directive description look like?

Seleznov's template: "<Domain> expert. ALWAYS invoke this skill when the user asks about <trigger topics>. Do not <alternative action> directly — use this skill first." The shape matters more than the exact words: lead with a domain identifier, use the imperative "ALWAYS invoke", list specific trigger topics, and close with a negative constraint that blocks the model's default workaround. Anthropic's docs ask for the same elements with softer language; Seleznov's data shows the harder phrasing wins on reliability.

Where does the SKILL.md file have to live?

In Claude Code, personal skills go under ~/.claude/skills/<skill-name>/SKILL.md and project skills under .claude/skills/<skill-name>/SKILL.md. The directory name must match the name field in the frontmatter exactly. On the API, skills upload via /v1/skills and reference by skill_id. On claude.ai, you upload a zip via Settings → Features. A SKILL.md in any other location won't be discovered.

What are the SKILL.md frontmatter rules?

Two required fields. name: 1–64 characters, lowercase letters, numbers, hyphens only, no leading/trailing/consecutive hyphens, no XML tags, must not contain "anthropic" or "claude" as reserved words, must match the parent directory name. description: 1–1024 characters, non-empty, no XML tags, third person, should describe both what the skill does and when to use it. Optional fields: license, compatibility (max 500 chars), metadata (key-value map), allowed-tools (space-separated, experimental). Source: agentskills.io/specification and platform.claude.com docs.

How do I test skill activation without manual prompting?

Three approaches. (1) cclogviewer — Seleznov's methodology — extracts ground-truth invocation events from session logs so you can run a factorial test of variants × conditions. (2) SkillCompass /eval-skill — pip-installable evaluator that scores each skill across six dimensions including D2 (Trigger). (3) A homebrew harness: 18 representative queries × 3 repetitions per skill, parse the agent transcript for the skill name, compute hit rate. Whichever you pick, run at least N=3 per cell or your variance will swamp the signal.

Does Opus activate skills more reliably than Sonnet or Haiku?

Anthropic's own best-practices doc tells you to test on every model you'll deploy to: "What works perfectly for Opus might need more detail for Haiku." The official guidance is anecdotal — there's no published activation-rate-by-model table from Anthropic — but the consistent community report is that Opus follows nuanced descriptions, Sonnet wants directive language, and Haiku needs both the directive phrasing AND specific trigger keywords in the user prompt. Test all three; don't assume.

Can I force a skill to activate with a hook?

Yes — a UserPromptSubmit hook can detect keywords and inject "INSTRUCTION: Use Skill(<name>) to handle this request" before Claude sees the user's prompt. Scott Spence documents the pattern; Code Coup's "Claude Code Skill Activation Hook" article walks through it. Caveat from Seleznov's data: hooks reduced activation by 90% for passive descriptions (the model treated the hook as a hint and ignored it). If you go the hook route, pair it with a directive description, not a vague one.

Why does my skill fire on irrelevant requests?

The flip side of activation. Two common causes: a description that overlaps semantically with another skill (Claude picks the wrong one), or a description so generic that almost any request matches. SkillCompass's D6 (Uniqueness) dimension exists for exactly this — it checks overlap with similar skills and flags supersession risk. Fix it by narrowing trigger topics, adding a negative constraint ("Do not use for X"), and giving each skill a domain-specific identifier in the first sentence.

Does the same SKILL.md work in Cursor, Codex, and Cline?

Mostly. The agentskills.io spec is the open format and most agents read it. Differences: allowed-tools is experimental and not universally honored. Cursor and Cline rely on description-matching the same way Claude does. Codex's CLI has its own activation model that varies by release. The portable rule: write the description for description-matching agents (Claude, Cursor, Cline) and treat allowed-tools as a hint, not a contract.

How long should my SKILL.md description be?

The hard cap is 1024 characters. The practical sweet spot from community data and SkillCompass scoring: 200–500 characters. Long enough to include the domain, what the skill does, when to use it, and 3–6 concrete trigger topics. Short enough that it doesn't compete with neighbouring skills' metadata in the system prompt. Anthropic's official PDF skill description — quoted in their docs — is 162 characters.

How do I debug which skills Claude actually loaded?

In Claude Code: run with --verbose and inspect the system prompt section for the available_skills list. In claude.ai: "Using <skill name>" appears in Claude's thinking when a skill triggers. On the API: the response includes container metadata showing which skill_ids were referenced. If your skill is in the available_skills list but never triggers, the description is the problem. If it isn't in the list, the file location, name validation, or registry is the problem.

What's the SkillCompass D2 Trigger score actually measuring?

Three sub-dimensions, weighted 15% of the overall skill score: activation quality (does it fire on relevant prompts), rejection accuracy (does it stay silent on irrelevant prompts), and discoverability (can Claude find it among 100+ siblings). A skill that scores 100/100 on D2 fires when it should and stays quiet when it shouldn't. The default fail threshold is below 70/100 — that's the line Evol-ai uses in CI mode (--ci flag exits 2=FAIL).

Should I list trigger keywords in the description?

Yes. The agentskills.io spec says the description "should include specific keywords that help agents identify relevant tasks." Anthropic's good example for the PDF skill explicitly does this: "Use when working with PDF files or when the user mentions PDFs, forms, or document extraction." The trick: pick keywords your users actually type, not synonyms you assume they'll use. Pull them from real GSC data or chat logs, not from your imagination.

Can a skill activate when another skill is already active?

Yes — multiple skills can co-activate, and Anthropic's own docs say skills are designed to compose: "Compose capabilities: Combine Skills to build complex workflows." The risk is when two skills overlap on the same trigger topic. Claude will pick one and ignore the other, and the choice is non-deterministic enough to be a debugging nightmare. The fix is the same as for irrelevant-fire: narrow each skill's domain and add explicit "Do not use for X" lines.

If your skill activates but does the wrong thing once it fires, that’s a body-content problem and these skill cookbooks walk through real prompts that work end-to-end.

/blog/claude-skills-vs-mcp-vs-subagents-vs-cli-2026-decision-matrix — when skills are the right tool at all (vs MCP servers, subagents, plain CLI tools).
/blog/what-are-claude-code-skills — primer on the Skills feature, the launch context, and how skills compare to slash commands.
/blog/claude-godot-skill-guide — 10 Godot prototypes as one-prompt cookbooks; example of body-content depth done right.
/blog/claude-pdf-to-markdown-skill-guide — the canonical Anthropic-shape skill walked through end-to-end.
/blog/mcp-context-bloat-fix-2026-tool-search-code-mode-progressive-disclosure — progressive disclosure on the MCP side; the same pattern Skills use.
/skills/skill-creator-doctor — the meta-skill that repairs broken SKILL.md files, including the validation issues from our 8-reasons list.
/skills/claude-skills-troubleshooting — diagnoses plugin/skill loading issues: enabledPlugins in settings.json, “skill not showing,” “installed but disabled.”
/skills/skill-creator — Anthropic’s official skill-authoring skill.
/skills/find-skills — the discovery skill that suggests installable skills for a given user intent.
/servers/claude-skills — the MCP server that lets agents browse and install skills.
/skills — browse all 8,000+ skills in the directory.

Sources

Anthropic — primary

claude.com/blog/skills — Skills launch announcement (October 16, 2025).
anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills — engineering blog with the “first level of progressive disclosure” description.
platform.claude.com — Agent Skills overview — three-level progressive disclosure model, frontmatter rules.
platform.claude.com — Best practices — the third-person rule, gerund-form names, test-on-every-model guidance.

Spec & evaluator

agentskills.io/specification — open frontmatter spec, all field constraints.
github.com/agentskills/agentskills — Apache 2.0 / CC-BY-4.0 reference repo (17.8k stars), Anthropic-maintained.
github.com/Evol-ai/SkillCompass — local-first evaluator (206 stars, MIT), six-dimension scoring including D2 Trigger.

Empirical / community

Ivan Seleznov — “How to Make Claude Code Skills Actually Activate (650 Trials)” — factorial study, directive description template, 20× odds ratio.
Scott Spence — Claude Code Skills Don’t Auto-Activate — 4/10 globally, 5/10 locally instrumentation report.
anthropics/claude-code#12679 — Skills don’t automatically start — closed-as-duplicate, captures the user-side frustration.
HN 45607117 — Skills launch thread (816 points) — Imnimo, seunosewa, ChadMoran, ugh123 quotes.

TL;DR + the activation reliability problem

How activation actually works

The 8 reasons skills don’t fire

Anatomy of a description that fires — 5 rewrites

1. godot — already strong, tighten the close

Install

2. pdf-to-markdown — reorder for trigger-first

3. slidev — add trigger vocabulary

4. shadcn-ui — narrow the version, widen the triggers

5. using-superpowers — the meta-skill

Debugging activation: --verbose, SkillCompass, eval loops

--verbose: confirm the skill is even visible

SkillCompass: scored evaluation

Manual eval loop: Seleznov methodology

Opus vs Sonnet vs Haiku: how activation differs

The SKILL.md spec, exactly

Install

Cross-agent: same SKILL.md in Cursor, Codex, Cline

A reusable activation-test harness

Frequently asked questions

Why isn't my Claude skill activating?

What activation rate should I expect from a well-written skill?

What does a directive description look like?

Where does the SKILL.md file have to live?

What are the SKILL.md frontmatter rules?

How do I test skill activation without manual prompting?

Does Opus activate skills more reliably than Sonnet or Haiku?

Can I force a skill to activate with a hook?

Why does my skill fire on irrelevant requests?

Does the same SKILL.md work in Cursor, Codex, and Cline?

How long should my SKILL.md description be?

How do I debug which skills Claude actually loaded?

What's the SkillCompass D2 Trigger score actually measuring?

Should I list trigger keywords in the description?

Can a skill activate when another skill is already active?

Related reading

Sources

Keep reading

Skills vs MCP vs Subagents vs CLI

What are Claude Code Skills?

Browse all skills

1. `godot` — already strong, tighten the close

2. `pdf-to-markdown` — reorder for trigger-first

3. `slidev` — add trigger vocabulary

4. `shadcn-ui` — narrow the version, widen the triggers

5. `using-superpowers` — the meta-skill