Best AI Agent Memory 2026: Mem0 vs Letta vs Zep vs Cognee
Best AI agent memory in 2026 means keeping an agent coherent over weeks, not turns — and the four serious frameworks do this in four genuinely different ways. Mem0 is a memory layer you bolt onto an existing agent. Letta is a runtime where the agent is its memory. Zep builds a temporal knowledge graph from conversation. Cognee builds a knowledge graph from everything else. We pulled architecture details from every repo, then mapped each one to the canonical agent use cases — so you can pick by problem shape, not vendor marketing.

On this page · 13 sections▾
TL;DR + decision tree
- If you want the fastest path from zero to working memory on an existing agent — five lines of Python, hosted or self-hosted, with a clean SDK — pick Mem0. It’s a memory layer, not a framework. Drop it in.
- If memory is the architecture — you want stateful agents you address as services, each with a self-edited core scratchpad plus a giant archival store — pick Letta. The MemGPT lineage makes the memory hierarchy the framework’s reason to exist.
- If your agent is conversational and the value is “remember what was true about this user as of last Tuesday” — chronological recall, fact extraction, temporal validity — pick Zep. The temporal knowledge graph (Graphiti underneath) is the product.
- If your memory problem is shaped like a graph — documents, entities, relationships, codebases, organisation charts — pick Cognee. It’s closer to graph-RAG than to chat memory, and the remember / recall / forget / improve pipeline reflects that.
These four overlap less than the marketing surfaces suggest. Mem0 and Zep both call themselves “memory for AI agents,” but Mem0 optimises for stable user-preference recall while Zep optimises for chronologically correct fact lookup. Letta and Mem0 both let you persist agent state, but Letta hosts the agent itself while Mem0 is a library you call from your own runtime. Cognee shares an Apache 2.0 license and a Python-first SDK with all three but lives closer to Neo4j + LLM than to a memory layer in the chat sense. Read the architectures section before you write any code — the wrong choice early costs a refactor that’s harder than picking a database.
What “agent memory” actually means
The word “memory” gets stretched to cover at least five different things in agent-land. Separate them before comparing frameworks — most arguments about which is best are really arguments about which kind of memory matters for the problem at hand.
Short-term / context-window memory is what the model already has: the prompt plus the last N turns. No framework solves this; it’s a property of the model and the orchestration layer. Token budget, attention degradation, and per-call cost are all in play. Everything below is about externalising state so the model doesn’t hold it all in context every turn.
Long-term / persistent memory is the generic name for anything written to a store that survives between sessions. All four frameworks here are solutions for it; they differ in what they think long-term memory should contain.
Episodic memory records specific things that happened — “on Tuesday the user said the checkout flow was broken, on Wednesday support marked it resolved.” Zep is the framework most obviously designed for this shape: every fact has a valid_at and invalid_at timestamp so you can ask “was this true on Wednesday morning?” without the model hallucinating. Mem0 stores facts but doesn’t index them temporally as a first-class concern; Letta approximates episodic recall through archival search.
Semantic memory is timeless factual knowledge — “the user prefers metric units,” “HQ is in Amsterdam.” Mem0’s sweet spot. The April 2026 algorithm update reported 91.6 on LoCoMo (a long-conversation recall benchmark), so the team is optimising hard for this shape. Letta’s core-memory blocks are primarily semantic too — facts the agent has learned and edits as the relationship evolves.
Procedural memory — “how to do something” — is the least well-served by any of the four. You’ll mostly handle it yourself by storing tool-call sequences or successful workflows. Letta gets closest, since an agent can self-edit a memory block titled “how I solve refund disputes” and the next session inherits it.
Cognee sits across all of these by being a graph layer rather than a memory layer in the chat sense. Primitives are nodes, edges, and the cognify pipeline that builds them from unstructured input. If your “memory” is really “a corpus of documents the agent traverses by relationship,” Cognee fits even though the chat-memory vocabulary barely applies.
Three memory architectures
Strip the surface APIs and the four frameworks reduce to three architectural patterns. Knowing which matches your workload is more useful than any benchmark.
Architecture 1 — vector index + extraction (Mem0)
The cleanest vector-first pattern. New messages arrive, an extraction step pulls out atomic facts, each fact gets embedded and stored with structured metadata (user_id, agent_id, session, timestamp). Retrieval fuses semantic similarity, BM25, and entity match into one ranked list. Mental model: structured notes with a vector index for fuzzy lookup. Strengths: simple API, fast lookup, low token overhead, easy to bolt onto any agent. Weaknesses: weak at modelling relationships between facts (graph is implicit in metadata), and chronology is just another metadata field, not a first-class temporal index.
Architecture 2 — tiered context with self-edit (Letta)
Letta inherits its memory model from MemGPT, which first treated the LLM context window as a tiered cache. The agent has a small core memory — a few hundred tokens of structured blocks the agent reads and writes itself via tool calls — and a much larger archival memory outside the context window, paged in via search. Older turns get summarised and pushed down the hierarchy. The framework manages page-in/page-out for you. Strengths: the agent feels genuinely stateful — it remembers what you told it last month because it wrote that fact into a core block. Weaknesses: opinionated runtime (you live in Letta’s server model), and the core-memory budget is a hard design constraint.
Architecture 3a — temporal knowledge graph (Zep)
Zep, powered by Graphiti, treats every message as a source of facts about entities and relationships, and every fact as a chronologically scoped edge in a knowledge graph. The validity window — valid_at and invalid_at — is the key innovation. If the user said “I work at Acme” in March and “I left Acme” in July, the graph encodes both with non-overlapping validity; a query in August retrieves the second. Strengths: chronological correctness over long horizons, automatic entity/fact extraction, sub-200ms lookups per Zep’s benchmark page. Weaknesses: graph backend (Neo4j-class) is real operational overhead, and the extraction pipeline burns LLM credits on high-volume conversations.
Architecture 3b — full knowledge graph from anything (Cognee)
Cognee shares the graph orientation with Zep but isn’t chat-shaped. It ingests arbitrary content — documents, code, emails, structured records — through a cognify pipeline that produces a knowledge graph plus embeddings. The recall step routes queries to whichever search strategy fits (graph traversal, semantic, keyword, hybrid). Strengths: best fit when your memory problem is really knowledge management; broadest backend flexibility (Neo4j, Kuzu, NetworkX, multiple vector stores). Weaknesses: heavier conceptual model — you think about the graph you’re building, not just call memory.add. Wrong tool for “remember the user’s favourite colour.”
Three quick litmus tests: if the question is “what does this user prefer,” vector + extraction wins (Mem0). If it’s “what was true at time T, given a moving conversation,” temporal graph wins (Zep). If it’s “how do entities in my corpus connect,” full graph wins (Cognee). If it’s “I want the agent itself to feel durable,” tiered runtime wins (Letta).
Side-by-side matrix
Every cell pulled from the official repo, docs, or pricing page. Snapshot taken 2026-05-11; pricing and feature surfaces drift, so confirm at the source before committing.
| Dimension | Mem0 | Letta | Zep | Cognee |
|---|---|---|---|---|
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 |
| Primary model | Vector + structured fact extraction | Tiered core + archival memory (MemGPT lineage) | Temporal knowledge graph (Graphiti) | Knowledge graph from unstructured input |
| Hosted offering | Yes — app.mem0.ai | Yes — Letta Cloud | Yes — Zep Cloud | Self-host (no first-party hosted as of 2026) |
| Self-host | Yes — Docker + Postgres + vector store | Yes — single server + Postgres | Yes — Community Edition + graph backend | Yes — flexible backends |
| Agent runtime included | No — library you call from your runtime | Yes — agents-as-a-service via REST | No — memory store you call from your runtime | No — Python framework you call |
| Knowledge graph | Implicit via entity metadata | Implicit via core blocks | Yes — first-class temporal graph | Yes — first-class graph (multiple backends) |
| MCP server (first-party) | Yes — listed on mcp.directory | Tools surface via Letta server | Community shims (early) | Community shims (early) |
| Primary language | Python (TS SDK) | Python (TS SDK) | Python (TS SDK, Go internals) | Python |
| Best at | Stable user preferences, fast recall | Long-running stateful agents | Conversational temporal recall | Document / entity graph traversal |
Three takeaways. First, every framework here is Apache 2.0 — licensing is not a tiebreaker. Second, only Letta includes an agent runtime; the other three are libraries you call from yours, which matters for orchestration choices. Third, only Zep and Cognee ship first-class knowledge-graph indexes; Mem0 and Letta can model relationships through metadata but won’t outperform a proper graph engine on multi-hop traversal.
Mem0 — install + recipe
What it does best
Mem0 is right when you already have an agent and want it to remember things between sessions without rebuilding around a memory layer. The API is the shortest in this comparison: memory.add(messages, user_id=...) writes, memory.search(query, user_id=...) reads. Behind that surface is multi-signal retrieval fusion — semantic + BM25 + entity match scored in parallel — which the April 2026 algorithm update reports at 91.6 on the LoCoMo benchmark with 6.7-6.9K token overhead per retrieval. Headline strength is API minimalism plus a real free tier on the hosted product.
Pick this if you...
- Have an existing agent (LangChain, LlamaIndex, custom loop) and want to bolt on memory in under an hour
- Want a free hosted entry point (10k memories / 1k retrievals on the Hobby plan) before deciding whether to self-host
- Care most about semantic memory — stable user preferences, recent facts — and less about chronologically correct fact lookup
- Need MCP-first access from agents in Cursor, Claude Code, or Claude Desktop — Mem0 has an official MCP server already in this directory
Recipe: add memory to an existing agent in five lines
# pip install mem0ai
from mem0 import Memory
memory = Memory()
# Write
memory.add(
[{"role": "user", "content": "I prefer metric units and bullet points."}],
user_id="alex",
)
# Read
results = memory.search(
query="What does Alex prefer?",
user_id="alex",
limit=5,
)
for r in results["results"]:
print(r["memory"])That’s the whole loop. In your agent’s pre-prompt, run memory.search with the current user query and prepend the top-k memories to the system message. Post-turn, pipe the last few messages into memory.add. Mem0 handles extraction, deduplication, and fact merging so the store doesn’t grow linearly with conversation length.
Skip it if...
Your problem is chronological correctness across a long conversation graph — Zep’s temporal validity windows are a structural advantage Mem0 doesn’t match. Or your memory is really a document graph — Cognee or Zep’s Graphiti will give you multi-hop traversal Mem0 has to approximate through repeated lookups. Mem0’s sweet spot is “remember what this user prefers,” not “reason across a thousand documents.”
Letta — install + recipe
Tiered runtime · MemGPT lineage
Letta
Apache 2.0 · stateful agents-as-a-service with self-edited core memory + archival store
What it does best
Letta is the only framework here where memory defines the architecture rather than supplementing it. Inheriting from MemGPT, it gives every agent a small core memory (a few hundred tokens of structured blocks the agent reads and writes itself via tool calls) and a much larger archival memory the agent searches for older context. The Letta server hosts agents as addressable services: create once, stable identity, every interaction picks up where the last left off. Headline strength is durable agent personality — a user returns three weeks later and the agent still knows their name, preferences, and open threads, no system-prompt stitching required.
Pick this if you...
- Want a stateful agent runtime rather than a stateless agent plus a memory library
- Are building multi-agent systems where each agent needs a persistent identity (support agent, research agent, scheduling agent — each with its own evolving core memory)
- Trust the agent to self-edit its core memory blocks rather than wanting deterministic write paths
- Are coming from the LangGraph / CrewAI / AutoGen world and want an opinionated alternative — see our LangGraph vs CrewAI vs Letta vs AutoGen comparison
Recipe: create a persistent agent with core memory
# pip install letta-client
from letta_client import Letta
client = Letta(token="YOUR_TOKEN") # or self-hosted base_url
# Create a stateful agent once
agent_state = client.agents.create(
model="openai/gpt-5.2",
embedding="openai/text-embedding-3-small",
memory_blocks=[
{"label": "human", "value": "Name: Alex. Prefers metric units."},
{"label": "persona", "value": "I am a research assistant."},
],
tools=["web_search", "fetch_webpage"],
)
# Subsequent calls reuse the same agent — memory persists
response = client.agents.messages.create(
agent_id=agent_state.id,
messages=[{"role": "user", "content": "Remember I'm moving to Berlin."}],
)
# Agent self-edits core memory via tool calls; archival memory
# absorbs the rest. Pick the agent_id back up next session.The agent itself can call core_memory_replace and archival_memory_insert as tools during its turn, which is the MemGPT-style self-management. You don’t write to memory directly; you trust the agent to do it. Some teams find that liberating; others find it unpredictable. Try both before committing.
Skip it if...
You already have an orchestration framework (LangGraph, CrewAI, your own loop) and don’t want a second runtime. Letta is opinionated; sliding it under an existing framework is more friction than picking Mem0 as a pure memory library. Also skip it if you need precise control over what gets remembered and forgotten — Letta’s self-editing model trades determinism for emergence.
Zep — install + recipe
Temporal knowledge graph · powered by Graphiti
Zep
Apache 2.0 · auto-extracted entities, facts, and relationships with valid_at / invalid_at windows
What it does best
Zep is right when your agent is conversational and chronological correctness matters. Every message pushed runs through a pipeline that extracts entities and facts, links them into a graph, and attaches a temporal validity window to each fact. The Graphiti engine underneath is open-source and usable without Zep Cloud. Headline strength is the question nothing else in this comparison answers cleanly: “what was true about this user as of last Tuesday afternoon?” For customer success, longitudinal coaching, mental-health support, or any domain where conversational context drifts across weeks, the temporal index is structural, not cosmetic.
Pick this if you...
- Build conversational agents that talk to the same user across days, weeks, or months
- Need facts to expire automatically when newer facts contradict them (the user changed jobs, moved cities, switched preferences)
- Want the Graphiti graph engine without writing graph code — the Zep SDK abstracts node/edge creation behind a message-shaped API
- Are comfortable running a graph backend (Neo4j-class) or paying for Zep Cloud’s managed graph
Recipe: chat memory with temporal facts
# pip install zep-cloud
from zep_cloud.client import Zep
from zep_cloud.types import Message
zep = Zep(api_key="YOUR_KEY")
user_id = "alex"
session_id = "alex-session-001"
zep.user.add(user_id=user_id, email="[email protected]")
zep.memory.add_session(session_id=session_id, user_id=user_id)
# Push conversation — facts get extracted into the temporal graph
zep.memory.add(
session_id=session_id,
messages=[
Message(role="user", content="I just moved from Amsterdam to Berlin."),
Message(role="assistant", content="Got it — Berlin now."),
],
)
# Retrieve a memory bundle scoped to the session
memory = zep.memory.get(session_id=session_id)
print(memory.context) # facts + relevant prior summaries
# Query the temporal graph directly
results = zep.graph.search(
user_id=user_id,
query="Where does Alex live?",
scope="edges", # ranked facts
)
for r in results.edges:
print(r.fact, r.valid_at, r.invalid_at)If next month Alex says “I moved to Lisbon,” Zep writes a new fact-edge with the new valid_at and retracts the Berlin edge by setting its invalid_at. A query in August naturally returns Lisbon. Mem0 can model this with metadata; Zep makes it the primary index.
Skip it if...
Your agent doesn’t actually talk to the same user repeatedly, or your memory problem is shaped like a document corpus rather than a conversation log. Zep’s strengths are wasted on stateless agents. Also skip if you want minimal infrastructure — the graph backend is a real operational commitment compared to Mem0’s pgvector / Qdrant story. Zep Cloud’s free plan (1,000 credits/mo) is the easiest way to try the temporal model without standing up Neo4j.
Cognee — install + recipe
Full knowledge graph · graph-RAG layer
Cognee
Apache 2.0 · builds a knowledge graph from unstructured input with the cognify pipeline
What it does best
Cognee is right when your “memory” is really a knowledge-management problem. Feed it documents, code, emails, structured records, transcripts — anything textual — and the cognify pipeline produces a knowledge graph: entities as nodes, relationships as edges, embeddings for fuzzy lookup. Recall auto-routes queries to whichever search strategy fits. Cognee’s strongest relative claim is backend flexibility — adapters for multiple graph stores (Neo4j, Kuzu, NetworkX) and multiple vector stores, so you assemble the stack you already trust.
Pick this if you...
- Have a corpus of documents, transcripts, or structured data the agent needs to reason across — not just conversations to remember
- Need multi-hop graph traversal (“which people on the Atlas project also worked on Helios”) rather than nearest-neighbour over chunks
- Want to bring your own graph backend (already running Neo4j? already happy with Kuzu?) without buying into a hosted product
- Are comfortable thinking in graph terms — entities, edges, schemas — rather than “just add a memory.”
Recipe: ingest documents into a knowledge graph
# pip install cognee
import cognee
import asyncio
async def main():
# Ingest unstructured input — Cognee handles chunking, entity
# extraction, edge creation, and embedding under the hood
await cognee.add(
"Atlas was a 2024 project led by Alex. "
"Helios followed in 2025 with the same core team."
)
# Build the graph from what's been added
await cognee.cognify()
# Recall — auto-routes between graph traversal, semantic, hybrid
results = await cognee.search(
query_text="Which projects has Alex led?",
query_type="GRAPH_COMPLETION",
)
for r in results:
print(r)
asyncio.run(main())The cognify call is doing the heavy lifting: extracting entities, inferring relationships, and writing them to your chosen graph backend. search with a query type lets you switch between graph completion, semantic recall, or summarisation depending on what the agent needs. The same pipeline ingests Markdown, PDFs, raw text, or programmatic input through the same add surface.
Skip it if...
Your problem is “remember what this user said yesterday” — that’s Mem0 or Zep, not Cognee. The graph machinery is unnecessary overhead for chat-shaped memory, and you’ll spend more time thinking about schemas than benefiting from them. Pick Cognee when entities and relationships are first-class concerns, not when you wish they were.
Pick your shape: when memory ≠ vector DB
Before picking any of these four, ask whether you even need a memory framework or whether a vector database would do. Full head-to-head is in our Chroma vs Pinecone vs Qdrant vs Weaviate vs pgvector comparison. Short version: a raw vector DB is right when your agent only needs nearest-neighbour lookup over arbitrary chunks — no extraction, no dedup, no forgetting, no temporal correctness. “Here’s 50,000 documents, return the five most similar.”
Memory frameworks earn their keep when you need at least one of four hard problems solved beyond similarity. Fact extraction: turn multi-turn conversation into atomic facts so you don’t store “I’m a vegetarian,” “I don’t eat meat,” and “no animal products please” as three items. Merging / dedup: consolidate when the user repeats a fact with variation. Forgetting: when newer contradicts older (“I moved to Berlin” → “I moved to Lisbon”), the older fact loses primacy. Query routing: pick latest fact, chronological history, summary, or entity graph per query.
Mem0 solves all four for the chat-preference shape. Letta gives the agent self-edit tools and lets it figure out what to remember. Zep solves them with temporal validity. Cognee solves them by promoting entities and relationships to first-class citizens. A vector DB solves none of them. If you find yourself reinventing extraction or forgetting on top of a vector DB, you’re building a worse version of the framework you didn’t pick. Buy, don’t build.
One more axis: where does memory live in your stack? Mem0 and Cognee are libraries you call from orchestration code. Zep is a separate service your orchestration layer queries. Letta replaces the orchestration layer with its own server. Stack compatibility isn’t neutral — picking Letta means picking out of the LangGraph world; picking Mem0 means staying in it. See our LangGraph vs CrewAI vs Letta vs AutoGen comparison for the orchestration side.
Common pitfalls
Treating memory like an append-only log
The first instinct is to memory.add every message. Within a week the store is full of near-duplicate facts, the retrieval surface gets noisier, and the model starts citing stale preferences. Let the framework do extraction and dedup — that’s the whole point. Filter what you push: tool outputs and system messages rarely need to land in long-term memory.
Picking by GitHub stars
All four projects have healthy star counts and active communities. The popularity signal collapses the architecture decision into a meme. Mem0 isn’t “better than” Zep — they solve different problems. Re-read the architectures section before pressing star.
Forgetting that memory costs tokens
Every retrieval expands the prompt. Even a tight setup pulls 5-10k tokens per turn for context. Multiply by your turn count and you’ve added a meaningful line item to the model bill before counting any framework fees. Mem0’s reported 6.7-6.9k overhead per retrieval is realistic — budget for it.
Trusting self-editing blindly (Letta)
Letta’s agent edits its own core memory. That’s powerful when it works and frustrating when the agent writes the wrong thing into a block and refuses to retract. Build a periodic audit: dump the core blocks, eyeball them, rewrite manually if the agent drifted. The pattern is the same risk shape as letting an agent edit its own system prompt — useful, dangerous, audit-required.
Graph schemas you regret (Cognee / Zep)
A knowledge graph is only as useful as its entity and relationship taxonomy. Auto-extraction will surface duplicate entity labels (“Acme” vs “Acme Corp” vs “Acme Corporation”), conflated relationships (“works_at” vs “employed_by ”), and noise edges. Plan a quarterly graph audit; both Zep and Cognee expose tools for entity merge and relationship cleanup, but neither will save you from schema drift without supervision.
Cost surprises on the extraction pipeline
Mem0, Zep, and Cognee all run LLM calls during the write path to extract facts or build the graph. On a high-volume conversational product, that bill rivals the retrieval bill. Pin a cheap model for extraction (4o-mini class is usually fine), and don’t re-extract on every message — batch writes where the framework allows.
Lock-in by migration laziness
Each framework’s native data model is genuinely different. If you store memory only inside the framework, swapping later costs you. The fix is cheap: keep the raw event stream (conversations, document ingests) in your own Postgres or S3, and treat the framework as an index over it. Then migration is a re-ingest, not a salvage job.
Community signal
The agent-memory space matured through 2025 — MemGPT renamed to Letta, Zep doubled down on Graphiti, Mem0 shipped multi-signal retrieval fusion in April 2026, Cognee published research on the LLM-graph interface. That’s a younger conversation than the docs-RAG space, and the verbatim discourse is thinner than for tools like Context7 or DeepWiki. We won’t fabricate quotes.
Two patterns are consistent across GitHub discussions, vendor blogs, and conference talks. First, “just use a vector DB” is fading; teams shipping long-running agents pick a memory framework because the four hard problems hit them within weeks of go-live. Second, nobody uses two of these in production simultaneously — running both Mem0 and Zep means owning two extraction pipelines and two retrieval surfaces, doubling latency, cost, and ambiguity. Pick one, commit, revisit in a quarter.
Fastest signal you can collect yourself: clone each project, run its quickstart against a real transcript from your product, measure time-to-first-useful recall, token cost per turn, and behaviour on fact retraction. The framework that wins on your transcripts is the framework that wins.
Frequently asked questions
Which agent memory framework should I pick if I'm starting from scratch?
It depends on the shape of the agent. For a single-user assistant where you mostly want it to remember preferences and recent conversation, Mem0 has the cleanest API and the shortest path from zero to working memory — five lines of Python. For a multi-agent system where each agent needs its own persistent identity and self-managed scratchpad, Letta is the better fit because memory is the architecture, not a layer bolted on top. For a chat-style support bot where the conversation graph (who said what, when, about whom) is the product, Zep's temporal knowledge graph pays for itself. For document- and entity-heavy workloads where the agent needs to traverse relationships between things rather than recall what someone said, Cognee is closer to graph-RAG than to memory in the chat sense. None of these are substitutes — they overlap less than the marketing pages suggest.
Is Mem0 free? What does the hosted plan cost?
The Mem0 Python SDK on GitHub is Apache 2.0 and free to self-host. The Mem0 Cloud Platform at app.mem0.ai has a free Hobby tier (10,000 stored memories, 1,000 retrieval requests/mo, one project, community support), then Starter at $19/mo (50k memories, 5k retrievals), Growth at $79/mo (200k memories, 20k retrievals, three projects, basic analytics), Pro at $249/mo (500k memories, 50k retrievals, unlimited projects, private Slack), and a usage-based Enterprise tier. A startup program offers Pro free for three months to companies under $5M funding. Always confirm at mem0.ai before signing.
What does Letta actually do that LangGraph doesn't?
LangGraph is a graph-based orchestrator for LLM workflows; Letta is an agent runtime where the agent has a stateful identity that persists between sessions. The key distinction is the MemGPT-inspired memory model Letta inherited from its predecessor: core memory (small, always in context, self-edited by the agent through tool calls) plus archival memory (large, searchable, externalized). LangGraph leaves persistent state up to you. Letta makes it the framework's job. If you want full control over the orchestration graph, use LangGraph. If you want a server that hosts agents-as-a-service and never loses what they learned about a user, use Letta. They coexist — see our LangGraph vs CrewAI vs Letta vs AutoGen comparison for when each shape fits.
How is Zep different from a vector database with a chat wrapper?
A vector database stores message embeddings and retrieves nearest neighbours. Zep does that, then does two more things: it auto-extracts entities, facts, and relationships from message history into a knowledge graph, and it timestamps every fact with valid_at and invalid_at dates so the graph remains chronologically queryable. The Graphiti engine underneath is what powers this — it's open-source and you can use it without the Zep Cloud product. The practical difference is that a vanilla vector DB will hand the model the five most semantically similar messages; Zep will hand it the five facts that are still true as of yesterday plus a small temporal graph of how those facts changed. For a long-running customer-success agent that talks to the same user across months, the difference is the difference between coherent and forgetful.
When is Cognee the right answer and when is it overkill?
Cognee is right when your memory problem is shaped like a graph: you're ingesting documents, emails, meeting notes, code, or any unstructured corpus where entities (people, projects, decisions, files) have meaningful relationships to each other. Cognee builds a knowledge graph through its remember/recall/forget/improve pipeline, so the agent traverses entities rather than scanning chat history. It's overkill when the agent's job is to remember 'the user prefers metric units' across two sessions — that's a Mem0 problem, not a Cognee problem. Picking the wrong shape costs you weeks: a graph engine for chat preferences feels like dragging a freight train to the corner shop.
Can I run any of these on a single self-hosted Postgres + Redis stack?
Mem0 self-hosts cleanly with Postgres plus a vector store (Qdrant, Chroma, pgvector all supported); the Docker compose path in the repo is the fastest start. Letta runs as a server (FastAPI) and persists to Postgres by default — single host, one container, done. Zep self-hosts but expects Postgres plus a Neo4j-style graph backend; the Community Edition is open-source and the documentation walks you through it. Cognee is the most flexible: it ships adapters for multiple graph backends (Neo4j, Kuzu, NetworkX) and multiple vector stores, so you assemble the stack you want. All four are Apache 2.0, so the licensing path is clean.
Do these memory frameworks expose an MCP server?
Yes, with caveats. Mem0 has an official Mem0 MCP server you can find on mcp.directory (linked in this post) — an agent in Cursor or Claude Code can read and write memories directly. Letta exposes its agents through MCP-compatible tools and a server endpoint; community MCP shims exist on top of the Letta REST API. Zep and Cognee have community MCP wrappers in early stages but no first-party servers at the time of writing. If MCP-first is a hard requirement today, Mem0 is the most polished route; the others get there via their REST or Python APIs.
Mem0 vs Zep — both call themselves 'memory for agents.' Which wins on retrieval quality?
They optimise for different question shapes. Mem0 wins on 'recall this user's stable preferences and recent facts' because its single-pass extraction plus multi-signal retrieval fusion (semantic + BM25 + entity matching) is tuned for that exact pattern; the team reports 91.6 on the LoCoMo long-conversation benchmark in their April 2026 update. Zep wins on 'recall what was true at this point in time' because its temporal knowledge graph gives every fact a validity window — facts that were retracted in last week's conversation won't surface as still-true. For static preferences and recent context, Mem0 retrieves faster and at lower token cost. For long-running agents where the conversation graph evolves, Zep's chronological structure usually pays for itself by month two. Run both on your transcripts before picking.
How do I migrate from one memory framework to another later?
Plan for it up front because the data models genuinely differ. Mem0 stores discrete memory items keyed by user_id with metadata — exporting that to JSON is straightforward and re-ingesting into Zep or Cognee means treating each memory as a message or fact. Letta's core memory blocks plus archival rows can be exported via the REST API and replayed into another store. Zep's graph is harder to migrate verbatim because nodes and edges with temporal validity windows don't have a direct equivalent in flat memory stores; you usually flatten it back to messages plus the latest facts. Cognee's graph is similar — easy to extract, lossy when flattening. The pragmatic advice: store the source-of-truth event stream (raw conversations, raw documents) separately, and treat each memory framework's index as derived. Then re-ingest is just a re-build.
Why not just use a vector database (Chroma, Qdrant, pgvector) directly?
You can, and for some workloads you should — see our Chroma vs Pinecone vs Qdrant vs Weaviate vs pgvector comparison. A raw vector DB gives you fast semantic search over arbitrary chunks but leaves four hard problems to you: fact extraction (turning conversations into atomic facts), deduplication and merging (the user said this twice, with slightly different wording), forgetting (retracting outdated facts when newer ones arrive), and query routing (deciding whether the model needs the latest fact, the chronological history, or a summary). Mem0, Letta, Zep, and Cognee each solve a different subset of those problems. If your agent only needs nearest-neighbour over chunks, a vector DB is enough. If it needs any of the four problems above solved, you're paying the cost of a memory framework one way or another — buy or build.
Sources
Mem0
- github.com/mem0ai/mem0 — Apache 2.0 Python SDK, README, install instructions
- docs.mem0.ai — API reference, self-host docs, integration guides
- mem0.ai/pricing — Hobby / Starter / Growth / Pro tiers and limits
- /servers/mem0 — Mem0 MCP server install card on this directory
Letta
- github.com/letta-ai/letta — Apache 2.0 agent runtime (formerly cpacker/MemGPT)
- docs.letta.com — concepts, memory blocks, SDK reference
- arxiv.org/abs/2310.08560 — original MemGPT paper (core / archival memory model)
Zep
- github.com/getzep/zep — Apache 2.0 memory store + Graphiti graph engine
- help.getzep.com — full documentation, SDK reference, schema model
- getzep.com/pricing — Free / Flex / Flex Plus / Enterprise tiers
- github.com/getzep/graphiti — underlying temporal-graph engine, open-source
Cognee
- github.com/topoteretes/cognee — Apache 2.0 knowledge-graph memory framework
- docs.cognee.ai — pipelines, adapters, SDK reference
Related comparisons
- /blog/chroma-vs-pinecone-vs-qdrant-vs-weaviate-vs-pgvector-mcp-2026 — when a raw vector DB is the right answer
- /blog/langgraph-vs-crewai-vs-letta-vs-autogen-2026 — orchestration framework comparison (includes Letta)
- /blog/goose-vs-cline-vs-aider-vs-claude-code-vs-opencode-2026 — coding-agent shootout
Internal links