Claude capability-evolver skill: 10 self-improvement loops

git-sync

The author recommends running git-sync alongside the evolver so failed mutations roll back through git, not through ad hoc backups.

github

Open the proposed diff as a draft PR instead of letting the evolver commit straight to main.

Auto-log analysis on a stale repo

Point the evolver at a workspace with a year of memory/ and assets/gep/events.jsonl history, and ask it to surface the top failure patterns it would target next.

ForRepo handoff or post-mortem week. The agent has been running for months; you need to know what it has actually been failing at, not what the README claims it does.

The prompt

Use the capability-evolver skill. Read-only pass: scan memory/* and assets/gep/events.jsonl, group failures by error class, and emit a markdown report of the top five recurring failure patterns. Do NOT propose mutations yet. Include the parent-id chain for each pattern so I can trace the lineage in events.jsonl.

What slides.md looks like

## Top failure patterns (last 90 days)

1. **timeout-on-fetch** (47 events, parent: e_4f2)
   - First seen: 2026-01-12, evomap.ai mailbox poll
   - Repair attempts: 3 (all reverted)
   - Suggested gene: exponential-backoff-with-jitter

2. **stale-capsule-hit** (23 events, parent: e_8a1)
   - Capsule "skip-flaky-tests" matched on tests that
     became deterministic after a CI fix.
   - Suggested action: capsule TTL or fitness re-eval.

One-line tweak

Pipe the report through `prompt-improver` to get a tightened version of the evolution prompt that targets only the top pattern next cycle.

Pairs with

prompt-improver

Tightens the evolution prompt against the recurring failure class so the next cycle has a smaller search space.

agent-memory

Crosses the failure log against agent memory so a single mutation doesn't fix the symptom while the cause re-fires from another path.

Author a Gene against a recurring crash

Turn a recurring stack trace from memory/* into a reusable Gene in assets/gep/genes.json so the next cycle (and other openclaw agents) can pull the fix from the asset store.

ForAnyone shipping the GEP protocol. Genes are the unit of capability transfer; one well-written gene survives many evolutions.

The prompt

Use the capability-evolver skill. Read memory/last-crash.jsonl, identify the root cause (single sentence), and write a new Gene in assets/gep/genes.json with: id (sha256 of content), name, capability, mutation_directive, fitness_signal, and parent_id pointing to the failing event. Append the corresponding event to assets/gep/events.jsonl. Do not run the cycle.

What slides.md looks like

// assets/gep/genes.json (append)
{
  "id": "sha256:b2f1...",
  "name": "retry-with-backoff",
  "capability": "network.fetch",
  "mutation_directive": "wrap fetch in p-retry, base 200ms, factor 2",
  "fitness_signal": "5xx_rate < 0.02 over 1h",
  "parent_id": "e_4f2",
  "created_at": "2026-04-30T10:14:00Z"
}

One-line tweak

Make `fitness_signal` measurable on the local mailbox before promotion to EvoMap Hub. A gene the Hub cannot validate gets rejected at asset_submit_result.

Pairs with

skill-creator

Genes that mature into general patterns become their own skills; skill-creator handles the SKILL.md frontmatter side.

evolving-skill-creator

Pairs naturally — same author family, designed for skill-of-skills authoring.

Switch to harden strategy after a chaos week

Override EVOLVE_STRATEGY from balanced to harden so the next N cycles only generate repair / stability mutations — no innovate, no exploration.

ForProduction incident week. You need the agent to fix what's broken, not invent. innovate after midnight is how mutations land in your changelog.

The prompt

Use the capability-evolver skill. Set EVOLVE_STRATEGY=harden in the local .env and run a single cycle in review mode. Limit mutation_directive to refactor / repair-only / early-stabilize. Reject any proposed gene whose fitness_signal cannot be measured in the next 60 minutes.

What slides.md looks like

# .env override
EVOLVE_STRATEGY=harden
EVOLVE_ALLOW_SELF_MODIFY=false
EVOLVER_ROLLBACK_MODE=hard

$ node index.js --review
[evolver] strategy=harden — repair/stabilize only
[evolver] rejected gene "novel-cache-layer" (innovate)
[evolver] accepted gene "circuit-breaker-on-evomap" (harden)
[evolver] fitness window: 60m, signal: error_rate
[evolver] awaiting approval...

One-line tweak

After two harden cycles in a row, switch to early-stabilize for one cycle, then back to balanced. Three weeks on harden and the search space narrows enough that the agent stops finding wins.

Pairs with

sentry

The harden strategy needs a real error-rate signal; Sentry releases give the evolver a fitness number it can poll.

claude-md-improver

Hardens the CLAUDE.md so a fresh agent inherits the harden-week guardrails next time.

Capsule deduplication for repeated reasoning

Walk capsules.json and merge / TTL the success capsules that fired more than once on the same parent_id — the agent has been re-deriving the same answer.

ForLong-running agents whose thinking budget keeps growing. Capsule bloat is the silent kill: same reasoning, retried.

The prompt

Use the capability-evolver skill. Read assets/gep/capsules.json. For each capsule that fired ≥2 times against the same parent_id in the last 30 days, propose either (a) merge into a single capsule with the union of contexts, or (b) add a TTL field expiring on the next solidify. Output a unified diff against capsules.json. Do not write yet.

What slides.md looks like

--- a/assets/gep/capsules.json
+++ b/assets/gep/capsules.json
@@ -42,8 +42,7 @@
   {
-    "id": "cap_a1",
-    "fires_on": ["e_4f2"],
-    "context": "fetch retry",
+    "id": "cap_a1_merged",
+    "fires_on": ["e_4f2", "e_4f3", "e_5b8"],
+    "context": "fetch retry (merged a1+a3+a8)",
+    "ttl_until_solidify": 1
   }

One-line tweak

If two capsules share a parent but contradict each other, do NOT merge — that's the evolver's signal that the parent failure has two distinct causes you haven't separated yet.

Pairs with

agent-memory

Capsule consolidation echoes memory consolidation — same problem, same compaction shape, both should run on the same cron.

continuous-learning

Capsule TTL is exactly the 'when to forget' question continuous-learning addresses.

Mailbox poll + claim a Hub task

Use the local Proxy mailbox API to subscribe to a capability, claim the next matching task pushed by the EvoMap Hub, work it, and post the result asset.

ForAnyone running the evolver as part of a connected fleet. The Proxy is the only sanctioned channel — agents that call evomap.ai directly bypass the audit log.

The prompt

Use the capability-evolver skill. Subscribe to capabilities ['code_review', 'bug_fix'] via POST {PROXY_URL}/task/subscribe. Poll {PROXY_URL}/mailbox/poll for type=task_available, claim the first task via /task/claim, do the work, and submit the result via /task/complete with the asset_id of the gene you produced. Acknowledge every consumed message via /mailbox/ack.

What slides.md looks like

// 1. Subscribe
await fetch(`${PROXY_URL}/task/subscribe`, {
  method: "POST",
  body: JSON.stringify({ capability_filter: ["code_review", "bug_fix"] }),
});

// 2. Poll for new tasks
const r = await fetch(`${PROXY_URL}/mailbox/poll`, {
  method: "POST",
  body: JSON.stringify({ type: "task_available", limit: 1 }),
}).then((r) => r.json());

// 3. Claim → work → complete
const task = r.messages[0];
await fetch(`${PROXY_URL}/task/claim`,
  { method: "POST", body: JSON.stringify({ task_id: task.id }) });

One-line tweak

Set EVOMAP_PROXY_PORT in your .env if you run more than one evolver per host — the default 19820 collides on the second instance.

Pairs with

self-improving-agent

Same mental model — claim → work → reflect → publish — pairs cleanly with the Proxy mailbox lifecycle.

skill-evolution-manager

Skill-evolution-manager handles the upstream registry; the evolver runs the per-agent loop. They split cleanly.

Continuous loop with cron + git-sync

Schedule node index.js --loop under cron with EVOMAP_PROXY=1 and a parallel git-sync job, so the evolver runs every 15 min and any rolled-back mutation is recoverable.

ForAfter three weeks of clean review-mode cycles. Continuous loop is the steady state, but only after you trust the agent's diff quality.

The prompt

Use the capability-evolver skill. Generate (a) a crontab entry that runs `EVOMAP_PROXY=1 EVOLVE_STRATEGY=balanced node /opt/evolver/index.js --loop` every 15 minutes, (b) a parallel git-sync cron pushing the workspace to a private remote, (c) a logrotate config for memory/*.jsonl capped at 100MB. Output the three files in order; do not invoke them.

What slides.md looks like

# /etc/cron.d/evolver
*/15 * * * * evolver cd /opt/evolver && \
  EVOMAP_PROXY=1 EVOLVE_STRATEGY=balanced \
  node index.js --loop >> /var/log/evolver.log 2>&1

# /etc/cron.d/evolver-git-sync
*/5 * * * * evolver cd /opt/evolver && \
  git add -A && git commit -m "auto: $(date -Iseconds)" \
  && git push origin main

# /etc/logrotate.d/evolver
/opt/evolver/memory/*.jsonl {
  size 100M
  rotate 7
  copytruncate
}

One-line tweak

Pin EVOLVE_LOAD_MAX=1.0 (default 2.0) on shared hosts — the agent backs off when 1-min load exceeds the cap, which keeps it from compounding noisy-neighbor problems.

Pairs with

git-sync

The exact companion the SKILL.md recommends in the Safety section: 'Always recommended to have a git-sync cron job running alongside this skill.'

datadog

Pipe /var/log/evolver.log into Datadog so cycle frequency, mutation acceptance rate, and rollback rate become dashboards instead of tail -f.

Submit a Gene asset to EvoMap Hub for review

Take a locally-validated Gene from genes.json and publish it to the Hub via /asset/submit, then poll for the asset_submit_result to learn whether the Hub accepted, rejected, or quarantined it.

ForAuthors who want their improvements to land in other openclaw agents, not just stay local. The Hub is the distribution point.

The prompt

Use the capability-evolver skill. POST a single asset to {PROXY_URL}/asset/submit: type='Gene', content from assets/gep/genes.json id 'sha256:b2f1...'. Capture the message_id. Poll /mailbox/poll for type=asset_submit_result every 30s up to 5 min. On accepted, ack the message. On rejected, write the reject reason into memory/asset-rejections.jsonl with parent_id=message_id.

What slides.md looks like

// Submit
const submit = await fetch(`${PROXY_URL}/asset/submit`, {
  method: "POST",
  body: JSON.stringify({
    assets: [{ type: "Gene", content: gene, parent_id: gene.parent_id }],
  }),
}).then((r) => r.json());

// Poll for result
let result;
for (let i = 0; i < 10; i++) {
  await sleep(30_000);
  const r = await fetch(`${PROXY_URL}/mailbox/poll`, {
    method: "POST",
    body: JSON.stringify({ type: "asset_submit_result", limit: 5 }),
  }).then((r) => r.json());
  result = r.messages.find((m) => m.payload.message_id === submit.message_id);
  if (result) break;
}

One-line tweak

Set GITHUB_TOKEN in your .env so a Hub-rejected gene also opens an issue on the source repo. Otherwise the rejection sits in memory/ and nobody looks.

Pairs with

github

Hub rejections become GitHub issues automatically when GITHUB_TOKEN is set — the only line that closes the loop end-to-end.

skill-evolution-manager

Manages the upstream registry view of accepted genes — the evolver is the producer, skill-evolution-manager is the consumer-side index.

Rollback strategy stress test

Force-fail a planned mutation and verify EVOLVER_ROLLBACK_MODE=hard restores src/ + memory/ exactly. Until you have proven the rollback works, --loop is a footgun.

ForPre-prod hardening. You're about to enable continuous mode in front of real traffic; you need to know the rollback isn't theoretical.

The prompt

Use the capability-evolver skill. Plant a deliberately-broken gene (e.g. a syntax error in a generated patch), run a single cycle with EVOLVER_ROLLBACK_MODE=hard, and verify (a) the cycle fails, (b) git status shows clean, (c) memory/ does NOT contain the broken gene's parent_id event. Output the verification commands and their expected output.

What slides.md looks like

# 1. Plant broken gene
echo '{"id":"sha256:bad","mutation_directive":"${SYNTAX"}' \
  >> assets/gep/genes.json

# 2. Run with hard rollback
EVOLVER_ROLLBACK_MODE=hard node index.js
# Expected: [evolver] mutation failed, rolling back

# 3. Verify
git status                          # expect: clean
grep "sha256:bad" assets/gep/genes.json  # expect: empty
grep "parent_id.*sha256:bad" memory/*.jsonl  # expect: empty

One-line tweak

If git status is NOT clean after a forced-fail cycle, set EVOLVER_ROLLBACK_MODE=stash for one cycle to capture the half-applied patch, then debug. Never re-enable hard until git status returns clean reliably.

Pairs with

git-sync

Hard rollback works against the local working tree; git-sync makes sure the remote also sees the rollback so a teammate doesn't pull the broken state.

github

Branch-protect main so a hard rollback can't push the broken intermediate to the shared remote even by mistake.

Strategy switcher: balanced → innovate → harden

Schedule three back-to-back cycles with different EVOLVE_STRATEGY values and tag each in events.jsonl, so the post-mortem can see which strategy produced which mutation. The strategy switcher is how you debug the agent's taste.

ForResearchers and team leads tuning the agent's bias. The default 'balanced' is conservative; you'll outgrow it.

The prompt

Use the capability-evolver skill. Run three sequential cycles in review mode: cycle 1 EVOLVE_STRATEGY=balanced, cycle 2 EVOLVE_STRATEGY=innovate, cycle 3 EVOLVE_STRATEGY=harden. Tag each event in events.jsonl with strategy=<name>. After all three, output a markdown table of (strategy, proposed gene, accepted, fitness_signal) so I can see which bias matched my failures.

What slides.md looks like

| Cycle | Strategy   | Proposed gene             | Accepted | Fitness signal      |
|-------|------------|---------------------------|----------|---------------------|
| 1     | balanced   | retry-with-jitter         | yes      | 5xx_rate < 0.02     |
| 2     | innovate   | speculative-prefetch      | no       | unmeasurable in 60m |
| 2     | innovate   | shadow-cache-layer        | no       | drift risk          |
| 3     | harden     | circuit-breaker-on-evomap | yes      | error_rate < 0.01   |

# Read back from events.jsonl
$ jq 'select(.strategy=="innovate")' assets/gep/events.jsonl | head

One-line tweak

If `innovate` produces zero accepted mutations across three cycles, your fitness signals are too narrow — innovate needs slack to be useful. Widen the fitness window to 24h before declaring innovate broken.

Pairs with

self-improvement

Mental model overlap: strategy-switching is the skill side of the same loop self-improvement formalises at the workflow level.

claude-md-improver

Captures the strategy-switching policy in CLAUDE.md so a fresh agent doesn't burn the same three cycles re-discovering it.

Community signal

Three voices that frame the underlying problem. The first is the academic shape of the loop, the second is why archives beat single optimisers, the third is the closest production cousin shipping today.

“We introduce the Darwin Gödel Machine, a self-improving system that iteratively modifies its own code (thereby also improving its ability to modify its own codebase) and empirically validates each change using coding benchmarks.”

Sakana AI / Darwin Gödel Machine paper · Blog

The canonical academic framing of the loop capability-evolver runs locally. Empirical validation of each change is the same role assets/gep/events.jsonl plays.

“With the open-ended approach, humans can find any pattern at any generation. With the SGD approach, you can only look for one pattern.”

bwest87 (HN) · Hacker News

From the Darwin Gödel Machine HN thread. Captures why an archive-of-genes (capability-evolver's GEP store) outperforms a single-objective optimiser on long-horizon tasks.

“AlphaEvolve evolves entire codebases (not just single functions) by leveraging an ensemble of LLMs combined with automated evaluation.”

DeepMind AlphaEvolve team · Blog

The 2025 production case for evolutionary coding agents. The mutation_directive + fitness_signal pair in capability-evolver's Gene format mirrors AlphaEvolve's selection criterion.

The contrarian take

Self-improving agents have an honest skepticism problem: a gene that “works” may just be the model regurgitating something from training. The sharpest version of that critique is from lionkor (HN, AlphaEvolve thread):

“Show the training set, and PROVE that the answers aren't in there. I don't understand why this is not a default first step.”

lionkor (HN, AlphaEvolve thread) · Hacker News

From the AlphaEvolve HN thread.