Skip to main content
Rohit Raj
होमप्रोजेक्ट्सServicesरिपॉज़नोट्सपरिचयसंपर्कवर्तमान कार्य देखें
← Back to Notes

AI Agent Memory in 2026: Mem0 vs Zep vs Letta vs MemPalace (Open-Source, Benchmarked)

Rohit Raj·June 6, 2026·13 min read

Agent memory — not the model — is the 2026 bottleneck. MemPalace just hit 54.1k GitHub stars and shipped v3.4.0 with a 96.6% LongMemEval score and zero API calls. Here's how the four open-source AI agent memory layers (Mem0, Zep, Letta, MemPalace) actually compare on architecture, real benchmarks, and honest licensing — plus a code snippet to add memory in minutes and how I'd wire it into a production agent.

open source ai agent memoryai agent memorymem0 vs zep vs lettaself-hosted ai agent memory
Glowing knowledge-graph nodes on a dark grid illustrating open-source AI agent memory layers compared in 2026

TL;DR

Agent memory — not the model — is the bottleneck in 2026. The breakout: **MemPalace** hit ~54.1k GitHub stars and shipped v3.4.0 on June 6, 2026, scoring 96.6% R@5 on LongMemEval with zero API calls (fully local, MIT). The field: Mem0 (largest community, cloud-first), Zep (temporal knowledge graph, now mostly SaaS), Letta (OS-style tiered memory, fully self-hostable), MemPalace (local-first, verbatim). Pick Mem0 for fast personalization, Zep for time-aware facts, Letta for long-running agents, MemPalace for offline/verbatim recall. The self-host licensing truth matters more than the benchmark bragging.

Why Agent Memory Is the Real 2026 Bottleneck

By Rohit Raj — Founding Engineer · 10+ yrs MVP shipping · LinkedIn

When a *memory library* — not a frontier model — is the week's breakout GitHub repo, the industry has told you where the hard problem moved. **MemPalace** crossed ~54.1k stars and cut v3.4.0 on June 6, 2026, and it did it on a single, almost rude claim: 96.6% R@5 on the LongMemEval benchmark with zero API calls — no cloud, no embeddings bill, running on your laptop.

The capability that changed is *where* memory lives. For two years the default answer to "how does my agent remember?" was "ship every turn to a hosted vector service and pay per token." MemPalace's local-first, verbatim-storage design — and its 88.9% R@10 on LoCoMo — says you can get state-of-the-art recall without leaking your data or your budget. That reframes the whole build-vs-buy decision for anyone shipping an agent.

Here's why this matters *now*: every "best AI agent memory" listicle ranks Mem0, Zep, and Letta, but most were written six to twelve months ago, gloss over which tools are *actually* open-source versus SaaS-in-disguise, and don't include the local-first newcomer. Below is the honest comparison — architecture, real (not self-reported) benchmarks, a code snippet to add memory in minutes, and how I'd wire it into a production agent without the failure modes the READMEs skip. It's the same evergreen-vs-hype tension I hit comparing vector databases for RAG — the memory layer sits right on top of that choice.

What Actually Shipped — and the Local-First Shift

Strip the launch noise and here is the concrete surface area, from the MemPalace repository:

  • Local-first, verbatim storage. It stores the actual text of a conversation — not an LLM summary — and retrieves with semantic search over a *structured* index (people and projects as "wings," topics as "rooms," content as "drawers"). No summarization means no silent information loss.
  • Pluggable backends. ChromaDB by default, plus SQLite, Qdrant, and pgvector — so your memory store can be the same Postgres you already run.
  • A temporal knowledge graph backed by SQLite, so facts carry a sense of *when* they were true, not just *that* they were said.
  • 29 MCP tools out of the box, with auto-save hooks for Claude Code and Gemini CLI, plus local models via Ollama and Docker images. It speaks Model Context Protocol natively, which is the integration path that actually matters in 2026.
  • MIT licensed, Python (94.5%). You fork it and own it — no per-seat pricing, no graph-memory paywall.

The numbers that make people star a repo: 96.6% R@5 on LongMemEval (raw, no LLM in the loop), 98.4% on a tuned hybrid configuration, 88.9% R@10 on LoCoMo (1,986 questions), 92.9% average recall on ConvoMem, and 80.3% R@5 on MemBench across 8,500 items — that last one matters because most memory tools degrade hard as the index grows, and a score at 8.5k items is a scale signal, not a toy-dataset score. The "zero API calls" framing is the real story — it puts privacy-preserving, offline agent memory in the same accuracy bracket as the cloud incumbents.

Why verbatim instead of summarization? Because summarized memory is *lossy by design*: the moment an LLM compresses "the user's invoice on March 12 was ₹48,200, disputed on March 14" into "user had a billing issue," you've thrown away the exact facts your agent will later be asked to recall. Verbatim storage keeps the raw text and leans on the structured index for retrieval — slower to write, far safer to recall. That's the local-first shift, and it's the same instinct behind cutting cloud dependency that I wrote about in LLM context compression.

How Does Each One Actually Remember?

The four contenders solve memory with genuinely different architectures. This is the part that decides fit — benchmarks come later.

Mem0 — the cloud-first universal layer. Mem0 (~47k+ stars, the largest community) combines vector, graph, and key-value storage with *automatic* memory extraction: you hand it a conversation, it decides what's worth remembering. Its ECAI 2025 paper reports a +26% accuracy lift over a naive baseline. The catch: the slick experience is cloud-first, and graph memory requires the paid Pro plan — the open-source SDK gives you the vector layer, not the full graph.

Zep — temporal reasoning done right. Zep is built on the Graphiti temporal knowledge graph: every fact is timestamped, so if a user says "I moved from London to Tokyo," Zep models the *state change* instead of returning both cities as current. It reports up to a 18.5% accuracy gain on LongMemEval for time-sensitive recall. The catch: Zep has stepped back from fully open self-hosting — Graphiti is open, but the full platform is SaaS from ~$25/month.

Letta (formerly MemGPT) — the OS for agent memory. Letta treats memory like an operating system: the LLM itself manages tiers — core context, recall, and archival — deciding what to page in and out. It's the right model for agents that run autonomously for days, and it's fully open-source and self-hostable. Independent evals put it around ~83% on LoCoMo, the highest verified score among the classic open-source tools.

MemPalace — local-first verbatim. The newcomer above: no API calls, MIT, 88.9% LoCoMo / 96.6% LongMemEval, native MCP. The trade is that it's young (v3.4.0) and its "palace" indexing model is opinionated. If you're building an AI chatbot or assistant where data can't leave the box, this is the one to test first.

How Do You Actually Add Memory to an Agent?

The reason memory libraries win stars is that the happy path is short. With Mem0, adding persistent memory is roughly four lines — extraction and retrieval are handled for you:

python
# Mem0 — add and search memory (open-source SDK, vector layer)
from mem0 import Memory

m = Memory()
m.add("User prefers Postgres over MongoDB for OLTP", user_id="rohit")

# later, in another session:
hits = m.search("which database does the user like?", user_id="rohit")
# -> ranked memories the agent can inject into its prompt

MemPalace trades that hosted convenience for a local server you fully control — no key, no egress, and it exposes itself to your agent over MCP:

bash
# MemPalace — local-first, no API key, 29 MCP tools
pip install mempalace
mempalace serve --backend chromadb        # or sqlite / qdrant / pgvector

# Point Claude Code or Gemini CLI at the MCP endpoint; auto-save
# hooks persist each turn verbatim into the "palace" index.

The thing the quickstarts won't tell you: the easy demo and the production system are different animals. m.add() and m.search() look magical until you ask *what* got extracted, *when* it gets evicted, and *how* you debug a wrong recall at 2am. Cloud tools hide that machinery (and bill you for it); local tools expose it (and make you own it). Pick the trade deliberately — the same way I argue for owning your stack in multi-agent Claude Code workflows.

Benchmarks: LoCoMo and LongMemEval, Honestly

Three benchmarks define this field: LoCoMo (single-hop, multi-hop, open-domain, and temporal recall), LongMemEval (knowledge updates and multi-session recall), and the newer BEAM (evaluations at 1M and 10M token scales). Here's the honest scoreboard — note the column that vendor blogs omit:

SystemLoCoMoLongMemEvalLocal / zero-APILicense posture
MemPalace88.9% R@1096.6% R@5 (raw)✅ YesMIT, fully open
Mem091.6% (self-report) / ~58–66% (independent)94.8%⚠️ Graph = paid ProCore SDK open
Zep~85%+18.5% temporal gain⚠️ Graphiti onlyPlatform is SaaS
Letta~83%—✅ YesFully open

The single most important row to read carefully is Mem0's. Its own materials cite 91.6% on LoCoMo, but independent comparisons put the reproducible number closer to 58–66%. That gap isn't an accusation of bad faith — it's the eternal benchmark problem: vendors tune for their own eval harness. Treat any self-reported memory benchmark as an upper bound, not a guarantee, and re-run the eval on *your* data before you trust a leaderboard. The benchmarks measure retrieval accuracy; they say nothing about latency, token cost, or how the thing behaves when your context is messy — which is most of the time in production.

Is It Really Open-Source? The Licensing Truth Table

This is the section the listicles bury, and it's the one that decides your bill. "Open-source" on a landing page rarely means "you can self-host the whole thing for free." Here's the actual split:

ToolWhat's openWhat's gated behind paid/SaaSTruly free to self-host?
MemPalaceEverything (MIT)Nothing✅ Yes — local, zero API cost
LettaFull frameworkManaged cloud (optional)✅ Yes
Mem0Core vector SDKGraph memory + hosted Platform⚠️ Partial — graph needs Pro
ZepGraphiti engineFull Zep platform (~$25/mo+)⚠️ Mostly no

The pattern: the two tools with the loudest marketing (Mem0 and Zep) are the two where the best features sit behind a paywall, while the two that win on *actual* openness (Letta and MemPalace) are quieter. None of this makes Mem0 or Zep bad — a managed service you don't have to operate is worth real money. But if your reason for choosing "open-source memory" was *cost control* or *data residency*, you need to know that Mem0's graph and Zep's platform will pull you back onto a meter. For a regulated build — payments, health — "the data never leaves our infra" is a hard requirement, not a nice-to-have, and only the fully-open options clear that bar.

Which AI Agent Memory Layer Should You Pick?

There's no single winner — there's a right tool per shape of problem. Here's how I'd actually choose, and when to skip each.

Pick Mem0 if you want personalization fast and don't mind a meter: chatbots, recommendation memory, "remember the user's preferences" features where time-to-first-value beats data control. Skip it if you need the knowledge graph for free or can't send data to a third party.

Pick Zep when *temporal correctness* is the product — anything where a fact's validity changes over time (CRM state, account status, support history). Its Graphiti timestamping is genuinely best-in-class here. Skip it if you wanted a fully self-hosted free stack; you'll end up on the SaaS.

Pick Letta for long-running, autonomous agents that operate for hours or days and need to manage their own context window — the OS-style tiering is built exactly for that. Skip it for a simple "look up the last thing the user said" feature; it's more machinery than you need.

Pick MemPalace when data can't leave the box, when you want zero ongoing cost, or when verbatim recall matters more than clever summarization — on-device assistants, privacy-first tools, anything offline. Skip it (for now) if you need a battle-tested vendor with a support contract; at v3.4.0 it's new, and new means you'll hit edges.

The honest meta-point: most teams over-engineer this. If your "memory" need is really just retrieval over documents, you may want plain RAG over your database before any of these. Reach for a dedicated memory layer when the agent needs to remember *interactions*, not just *documents*.

How I'd Wire Agent Memory Into Production

Here's the concrete way I'd adopt this on a real client build, not the demo version.

Start local, prove value, then decide on cloud. I'd stand up MemPalace or Letta self-hosted first — zero cost, full data control — and only move to a managed Mem0/Zep tier if operating it myself becomes the bottleneck. Starting on the meter before you've validated the memory *actually helps* is how you burn budget on a feature users don't notice.

Make memory observable from day one. The failure mode that bites everyone: the agent confidently recalls something *wrong*, and you can't see why. I log every add and every search with the retrieved IDs and scores, so a bad recall is debuggable instead of mystifying. On a fintech build like myFinancial, a memory that surfaces a stale account balance isn't a glitch — it's a trust-destroying bug. Verbatim storage (MemPalace) helps here because you can audit exactly what was remembered.

Treat the MCP boundary as a security boundary. Wiring memory in over Model Context Protocol is clean, but a memory server that an agent can write to is a prompt-injection target — poison the memory, poison every future turn. Scope writes, validate inputs, and never let an untrusted tool silently mutate long-term memory.

Budget for eviction, not just storage. Every team plans how memory gets *written* and forgets how it gets *forgotten*. Unbounded memory means rising latency and cost and worse recall as the index fills with noise. Decide your eviction or summarization policy before launch.

Keep your memory layer swappable. Because the leaderboard numbers don't transfer to your data (see the Mem0 self-report gap above), I wire memory behind a thin interface — remember() and recall() — instead of calling a vendor SDK directly throughout the agent. That makes it a one-file change to A/B Mem0 against MemPalace on *your* real conversations, and to bail to a self-hosted option if a SaaS bill or a data-residency rule changes the math. The cost of that abstraction is an afternoon; the cost of hard-coding the wrong memory tool into 40 files is a rewrite.

If you want this kind of memory-and-reliability engineering built into your AI product from day one instead of bolted on after the first bad recall, that's the work I do: I run fixed-scope 6-week MVP builds, and you can hire a founding engineer in India to own the whole agent stack — model, memory, and the production wiring that makes it trustworthy.

RELATED PROJECT

View MyFinancial →

Build an AI Agent With Memory That Works in Production

Let's Talk →

Read Next

Claude AI Vulnerability Scanner: Anthropic's Open-Source Code-Security Harness (2026)

Anthropic open-sourced defending-code-reference-harness — a Claude-powered pipeline that finds and p...

Cut LLM Token Costs Up to 90% with Context Compression (2026)

Headroom hit #1 on GitHub Trending on June 4, 2026 with a tool that compresses tool outputs, logs, a...

← All NotesProjects →

रोहित राज — बैकएंड और AI सिस्टम्स

Services

Founding Engineer for Hire in IndiaMobile App DevelopmentAI Chatbot DevelopmentFull-Stack Development

अपडेट्स पाएं