Memoir's storage substrate: prollytree
A Merkle search tree — what prollytree is, and what it enables for memoir.
Memoir is git-like version control for AI memory — capture, recall,
branch, commit, merge, all over semantic paths like
workflow.coding.style and
preferences.tools.editor. Under the hood, the storage
layer is prollytree: a Merkle search tree where keys
are sorted, nodes are content-addressed, and tree boundaries are
determined by content-defined chunking.
Here's what that gives memoir.
Multi-agent merge that works at the key level
Agents write to memoir in parallel — different conversations, different branches, different parts of the taxonomy. When their work comes back together, the merge needs to be intelligible: not a tangle of text artifacts, but a structured view of what changed where.
Prollytree merges operate at the key level by construction. Same-key edits with different values surface as one clean conflict — value A from branch X versus value B from branch Y at key K, anchored in the tree-state diff. Different-key edits auto-merge without ceremony.
That's the foundation for memoir's merge policy: each conflict gets classified as trivial (auto-resolve), ambiguous (LLM-mediated clarification), or contradictory (escalate to the user). The classifier reads structured key-level diffs directly from the tree. No re-parsing, no heuristics — the substrate hands it the right primitive.
Subtree sharing with cryptographic proofs
Memoir supports sharing memory slices.
"Share my workflow.coding.* subtree with your team,
verifiably, without exposing the rest of my store."
Prollytree's O(log N) Merkle inclusion proofs make
this a first-class primitive. The receiver gets a compact tuple —
{subtree-path, subtree-root-hash, proof} —
and verifies inclusion against the sender's signed store-root. No
need to clone the full store. No need to trust the sender's server.
Importantly, this keeps memoir unified at the data layer. A
cross-cutting change touching workflow.coding.* and
workflow.devops.* remains a single atomic commit in
the same store. Sharing is layered on top as surgical access — it
doesn't fragment the underlying store into a constellation of
permission-scoped repositories.
Federation at scale
Memoir is designed to pull in external hierarchical knowledge — internal wikis, design docs, technical glossaries, organizational standards — and map it into the taxonomy alongside agent-captured memories. Federation is meant to be routine, not heroic.
Prollytree's content-defined chunking is what makes that scale. When a 10k-path import arrives, only the chunks that actually changed get rewritten. Shared subtrees across different imports dedupe automatically at the storage layer. Deep taxonomies don't multiply the rewrite footprint by depth.
The result: federation cost scales with the diff between successive imports, not with the absolute size of the imported tree. The same operation that's a one-off engineering project on most substrates becomes an everyday operational pattern on prollytree.
Vector search integrated with the data
As memoir stores grow into the thousands of paths, semantic search via embeddings stops being optional. The natural place for that index — without giving up branch-level dedup or cross-substrate consistency — is inside the prollytree itself.
Vector indexes live as part of prollytree node payloads (or as a sibling content-addressed graph keyed by node hash). Two branches that share 99% of memories share 99% of their vector index physically. Incremental updates to embeddings follow the same chunking strategy as the data they describe.
Semantic search becomes a first-class capability of the store, with the same consistency, branching, and proof story as everything else memoir manages.
Looking ahead
Memoir is the substrate for collaborative AI memory — memory that multiple agents write to in parallel, that teams share slices of with verifiability, that ingests organizational knowledge as routine practice, and that scales to semantic search over thousands of paths.
Each of those capabilities is tractable because prollytree gives memoir the right primitives at the right layer:
- Merge at the key level — for multi-agent collaboration.
- Proofs at the subtree level — for team-scale sharing.
- Dedup at the chunk level — for federation.
- Indexes that share storage with the data they index — for semantic search.
The same way git made source-level collaboration tractable for an entire industry, a Merkle-tree-based memory substrate is what will make agent-level collaboration tractable for the next one. Memoir is the bet on what that looks like.