All blogs
Architecture

Two trees, one commit: vector search lands in Memoir

A feature coming to Memoir: hierarchical recall and vector recall over the same versioned store. The path tells you where a memory lives; the vector tells you what it says — so Memoir can finally hold long-form content without losing the ability to find the sentence inside it. One transaction, one commit, one branch — no separate vector DB, no sync drift.

The shape of the feature

Memoir today stores agent memory as a hierarchical taxonomy: named paths like context.project.architecture and knowledge.technical.build, versioned with git-style commits, branches, and diffs. Recall is by path. Great for structured facts. Less great when you want fuzzy similarity — "what did we say last week that sounded like this?"

The feature landing soon: a vector index that lives inside the same store. Same Merkle search tree (prollytree) underneath. Same commits. Same branches. When you write a memory, its embedding is computed and persisted in the same transaction — no eventual-consistency sync job, no second database to operate, no drift between the text and its vectors.

Two trees, one commit

The mental model is two prollytrees per namespace:

  • Primary. Your data — the document at docs/123, the memory at knowledge.technical.build. Same store you have today.
  • Index. Embeddings of the same content, addressable by the same key. Updated atomically when the primary key is written, removed when it's removed.

Both trees live under the same git ref. A single store.commit() covers both. Roll back the commit and the index travels with the data — no reindex job. Branch the store and you branch the index too. Diff two commits and you see exactly which vectors moved.

Hierarchical recall (read a path, walk a subtree) and vector recall (top-k by similarity) become two operations on the same versioned object. Nothing else changes about how Memoir tracks history.

The path tells you where. The vector tells you what.

Today, every memory has to be small enough that its path carries meaning. A fact about Comet's classloader fits under knowledge.technical.build — short, classified, findable because you already know the slot. That works beautifully for distilled lessons. It breaks the moment you want to keep a five-page design doc, a meeting transcript, or a long agent narration. There's no path granular enough to address a sentence inside a page.

Vector recall is exactly the layer that's missing. The hierarchy still does the coarse work — which subject is this even about? — and the index does the fine work — which paragraph in that subject is closest to my query? Same store, two granularities, working together:

  • Path-first. Walk to context.project.architecture when you know the category, then let vector search rank chunks within that subtree by relevance. Hierarchy is the filter; vectors are the sort.
  • Vector-first. Search by similarity across a namespace and look at where the hits landed in the hierarchy — the path itself becomes a label on the match, telling you whether you've found a build note or a spatial-join correction.

The practical consequence: Memoir entries no longer have to stay small to stay findable. You can keep a long-form artifact at one path — a transcript, an onboarding doc, an agent's entire reasoning trace from a hard debug session — and still pull out the right sentence later. The taxonomy captures the classification; the vector captures the contents; you don't have to choose.

Why this matters for agent memory

Most "give your agent a memory" stacks pair a primary store with a separate vector database. That gets you fuzzy recall, but it also gets you the worst version-control story in your system: two truths, a sync job between them, and no way to ask what did the agent know at commit X?

With the index inside the versioned store, the questions you can already ask of Memoir extend naturally to embeddings:

  • Rewind a poisoned corpus. An agent ingested a bad batch. Roll back to the last good commit; the index reverts with the data.
  • A/B test embedders. Branch the store, re-embed with a different model, compare recall on the same queries, merge the winner back.
  • Audit what changed. Diff two commits and see which memories — and which vectors — were added, removed, or rewritten.
  • Isolate multi-agent state. Namespaces still work the way you'd expect. Each agent's primary and index move together.

Walk the deck

The clearest way to see this is the interactive deck below. Eight slides: the model, the API surface, the branching story, the use cases. Use ← → or the nav at the bottom to step through.

Interactive deck — versioned vector search on prollytree. Open in a new tab

What this changes for you

Nothing, until you ask for it. Existing Memoir stores stay structurally identical. The vector index is opt-in per namespace; without it, recall and history work exactly as they do today. Turn it on and a second tree starts being maintained alongside your primary — same commits, same branches, same merge semantics.

The point isn't to replace path-based recall. It's to give you the second axis without giving up the first — and without giving up the version control story that made Memoir worth picking in the first place.

Open the interactive deck

Embeddings are derived data. Treat them like derived data — versioned, branchable, recoverable — and most of the operational pain of vector search just goes away.