Markdown内存上限
查看降价三个价值数十亿美元的独立人工智能代理平台汇聚在纯文本文件上进行内存存储。收敛验证了问题。他们共享的故障模式定义了接下来会发生什么。
要点
- 三个生产规模的人工智能代理平台(Manus、Claude Code、OpenClaw)独立地以 Markdown 文件作为主要内存系统,验证了代理内存问题是真实存在的,并且在数十亿美元的规模上都能感受到。
- 融合是由 LLM 经济学驱动的:Manus 100:1 的输入输出代币比率以及缓存和未缓存代币之间 10 倍的成本差距使基于文件的内存成为单位经济策略,而不是简单的偏好。
- 所有三个系统的故障模式都是相同的并有记录:无版本控制、无出处、无冲突检测、无实体解析、无模式约束以及损坏状态的并发写入。
- 提议的“平衡架构”(文件优先,强制时数据库)有一个差距:它从文本文件跳转到完整的数据库基础设施,中间没有任何状态完整性。
- 代理内存基础设施真正具有竞争力的不是矢量数据库。它是 Markdown 文件,升级路径是从文件到结构化状态,而不是从一种数据库产品到另一种数据库产品。

title: "The markdown memory ceiling" excerpt: "Three independent AI agent platforms worth billions converged on plain text files for memory. The convergence validates the problem. The failure modes they share define what comes next." published: true publishedDate: "2026-04-15" category: "Agent Architecture" tags: ["agent memory", "markdown", "state integrity", "convergent evolution", "file-based memory"] read_time: 8 heroImage: "the-markdown-memory-ceiling-hero.png" heroImageSquare: "the-markdown-memory-ceiling-hero-square.png" heroImageStyle: "keep-proportions"
The convergence nobody planned
Manus is a consumer-facing AI agent. Claude Code is Anthropic's coding assistant. OpenClaw is an open-source personal AI. Different teams, different codebases, different business models.
All three store agent memory in markdown files.
Manus uses a todo.md checklist that rewrites itself after each step. OpenClaw uses MEMORY.md plus dated files in a memory/ directory. Claude Code uses hierarchical CLAUDE.md files scoped to directories, with a 200-line cap on always-loaded content.
None appeared to copy the others. Yaohua Chen on DEV Community called this "convergent evolution." When three independent systems under different constraints arrive at the same architecture, the architecture is telling you something about the problem.
Micheal Lanham documented this convergence in March 2026. His analysis of all three systems is the most thorough public comparison of production agent memory architectures I have seen. The data is worth engaging with directly.
Why files are the default starting point
The obvious explanation is simplicity. Files are human-readable, git-trackable, and require no infrastructure. True but incomplete.
The deeper reason is LLM economics.
Manus co-founder Yichao "Peak" Ji published the numbers. Manus processes 100 input tokens for every 1 output token. On Claude Sonnet, cached tokens cost roughly $0.30 per million. Uncached tokens cost $3 per million. That 10x spread means input cost dominates. Anything that increases KV-cache hit rates saves real money.
File-based memory is stable, predictable text that plays well with KV-cache prefixes. Append-only context that rarely changes between calls means the model can reuse cached computations. A database-backed RAG system that assembles different context fragments each time defeats this optimization.
Manus's todo.md pattern is the clearest example. The agent rewrites the checklist after each step. This places the current plan in the most recent context window position. Information in the middle of long contexts gets ignored. A freshly rewritten plan file at the end of context fixes that with no retrieval infrastructure.
The economic argument extends beyond Manus. Claude Code caps always-loaded memory at 200 lines because memory files consume tokens every session. The constraint is not storage. It is attention budget. Files let you control what the model sees and where it appears in context.
These are not accidental choices. They are cost-aware architecture.
Where files break
Lanham's article is honest about the failure modes. That honesty is the most valuable part of the analysis.
Context budget pressure. Claude Code warns that large CLAUDE.md files reduce model adherence. Files work until they get bloated and internally contradictory. A 200-line cap is a pragmatic fix, not a solution. As agent use scales, the file grows, contradicts itself, and nobody knows which version of a fact is current.
Concurrency. Multiple agents writing to the same memory file corrupt state. Lanham states it directly: "The moment multiple agents or users need to touch the same memory, concurrent file writes can corrupt data." The single-agent ceiling is real. Most agentic workflows will not stay single-agent forever.
No versioning. Files get overwritten. OpenClaw's memory compaction triggers a silent agent turn that writes durable memories before truncation. What was in the file before compaction? Unknown. If the compacted version dropped a fact, it is gone. No observation log. No rollback.
No provenance. When an agent writes a memory entry, there is no record of what source produced it, when, or whether it contradicts something written last week. The file is a summary. Summaries obscure their ingredients.
No entity resolution. "Acme Corp" in one session and "ACME CORP" in the next. The agent re-infers identity each time from the context window. No stable IDs. No merge rules. No canonical entities. Every session is session-scoped inference.
No schema constraints. Any agent or tool can write anything to a memory file. No validation. No type checking. No enforcement of what a memory entry should contain. Bad writes propagate as truth.
These failures are not hypothetical. They are documented by the teams building these systems. They are the operational ceiling of file-based memory.
The gap in the equilibrium
Lanham proposes an "equilibrium architecture" with four layers. Files as primary interface. Aggressive offloading to disk. Derived retrieval layers (vector index over files). Clear escalation to databases when concurrency and correctness demand it.
The first three layers are well-documented. The fourth is left as an exercise for the reader.
"Escalate to a database" assumes the database solves the integrity problems. Postgres does not give you versioned observations by default. It does not give you provenance chains. It does not give you deterministic entity resolution across documents. It does not give you schema constraints on agent-written state. Moving from a markdown file to a database table does not solve "no versioning." It solves "no concurrent access." Those are different problems.
The equilibrium has a gap between layers three and four. Between "markdown files that work for one agent" and "full database infrastructure" there is a missing layer. Structured state with integrity guarantees. No custom database schema required.
OpenClaw's architecture hints at this. Its hybrid retrieval, sqlite-vec with configurable vector/text weighting, temporal decay, MMR diversification, is more sophisticated than simple file search. But it still treats the markdown files as source of truth. The index is a read optimization, not a state integrity layer.
The missing primitives are the same ones I identified running my own agentic stack:
- Versioned observations. Every write appended, nothing overwritten. Reconstruct state at any point in time.
- Provenance. Every fact traceable to a source, a timestamp, and the agent or human that wrote it.
- Deterministic entity resolution. Canonical IDs based on stable rules, not per-session inference.
- Schema constraints. Validation on writes. Bad data rejected before it enters the store.
These are not database features. They are state integrity features. You can build them on top of a database. Postgres will not give them to you out of the box. And you cannot get them from a markdown file at all.
Files are the real incumbent
The most important strategic insight from Lanham's analysis is not about files vs databases. It is about what the actual competitive landscape looks like.
Memory infrastructure companies raised tens of millions positioning against retrieval problems. Mem0 raised $24M. Letta closed a $10M seed at a $70M valuation. Zep's Graphiti project crossed 20K GitHub stars. MemPalace hit 46K stars in its first two weeks with a local-first, verbatim-storage approach. They solve real problems: durability across deployments, personalization, retrieval at scale, and structured recall.
But the systems handling the most agent interactions are not using vector databases for memory. They are using text files. Production evidence from three billion-dollar-scale platforms confirms that the real default is not an existing database product. It is a file.
This changes the displacement story. The upgrade path is not from vector databases to something better. It is from markdown files to structured state. The people who need state integrity guarantees are not currently using Mem0 or Zep. They are currently writing to MEMORY.md.
Migration, not replacement
Lanham's closing advice is correct in spirit: "Start with a Markdown file. You can always add a database later." Files are a rational starting architecture. The economics support them. The inspectability is real. The simplicity matters.
The question is what "later" looks like.
I am building Neotoma as that upgrade path. Structured state with the integrity guarantees files lack: versioning, provenance, entity resolution, schema constraints.
The cost efficiency question matters. If the upgrade path sacrifices the KV-cache economics that made files rational, it is not a real upgrade. Neotoma's read path is designed around this constraint. Agents access it via MCP. The response is structured text injected into the context window, the same format a model would see from reading a file. Entity snapshots are stable between calls. The same entity queried twice returns the same text unless an observation changed it. Stable text means stable token sequences. Stable token sequences mean KV-cache hits.
The write path is where the economics differ, and where they should. Writing an observation to a structured store with schema validation costs more than appending a line to a markdown file. That overhead is the price of versioning, provenance, and conflict detection. The question is whether that overhead is worth paying. If you have never needed to answer "what did my agent know last Tuesday" or "which write corrupted this entity," then no. Markdown is correct. If you have needed those answers and could not get them, the write-path cost is the cheapest part of the problem.
The migration story is straightforward. You started with MEMORY.md because it was the right default. You hit the ceiling when you needed versioning, or concurrent access, or provenance, or entity resolution across sessions. The next step is not "set up Postgres and build a custom schema." It is a structured layer that gives you those guarantees while preserving what worked about files: inspectability, simplicity, local-first operation.
The convergent evolution Lanham documented validates the problem. Three teams worth billions in aggregate arrived at the same architecture and hit the same walls. The walls define the next layer.