Why Ergo

The problem: agents don't just forget — they remember wrong

A vector store gives an agent recall. It does not give it consistency.

When a policy, path, or config changes, the old claim stays in the store — still embedded, still retrievable, still confident. The next session recalls it and acts on it. The failure mode isn't amnesia; it's stale memory poisoning new decisions, silently, with no signal that anything is off.

Retrieval alone can't fix this. On a realistic contradiction benchmark, an embedder pulls a stored claim's contradiction into the top‑10 neighborhood 87% of the time (57% at top‑1). To a vector store, a contradiction is just a very similar document — the retriever finds it; nothing judges it.

What Ergo does differently: verify at write time

Ergo is the judge bolted onto the retriever. Every guarded write (remember, learn, supersede) runs a four-stage check against the active claims in scope before storing:

normalize → structural compare → gated NLI → tier decision

A hard conflict comes back as HTTP 409 with the prior claim and its reason — the caller must resolve it: cancel, supersede with a new reason, or force an exception on the record. Softer conflicts store but warn. Every claim can carry a reason; why refuses to answer from reason-less facts; supersede chains preserve why the decision changed.

The write that comes back 409 is the product. It converts "silently overwrite a settled decision" into "confront the prior decision, on the record."

Measured, not asserted

On 105 labeled real-prose pairs, the judge runs at roughly P ≈ 0.80–0.86 / R ≈ 0.81 with zero false hard-blocks — and zero false blocks across every deploy smoke test since. The full engine ships with 292 offline tests, and the Docker build fails if the image touches the network at boot (models are baked in).

config	precision	recall	F1
structural-only	0.93	0.47	0.62
NLI ungated	0.42	0.70	0.53
NLI + gates (Ergo)	0.96	0.73	0.83

Normalizing first, comparing structure, then using NLI only on the fuzzy residual with an overlap gate — beats every piece alone.

A real catch (anonymized)

A team recorded a convention: "image releases must go through the release script, not a hand-run sequence of docker commands." Two days later the script moved to a different directory. When someone recorded the new convention, the write-time gate matched it against the stale claim — so it didn't become a second, contradicting "truth" sitting next to the first. It resolved as a supersede, with why it moved preserved in the new reason.

The avoided mistake: a future session recalling "how do I release," getting the dead path, or concluding the convention was contested and hand-running the exact docker commands the claim existed to prevent.

Contradictions are rare events with outsized cost — that's the point. A guard that fires seldom but truthfully beats a feed of noisy similarity alerts.

Why not just X?

Alternative	What it does NOT do (that Ergo does)
ChromaDB (+ embedder)	Retrieval only. No write-time judgment — a contradiction is stored as one more similar document (and it will co-retrieve, unjudged). No reasons, no supersede semantics, no "why did this change" history.
Pinecone	Same judgment gap, plus it's a managed cloud service — a non-starter for air-gapped deployments.
pgvector	Brings a Postgres server dependency for the same unjudged-similarity semantics. Ergo's deploy artifact is one container + one SQLite file; restore = copy one file.
Plain SQLite + embedder	This is Ergo's storage layer — minus the entire point: the normalizer, structural comparator, gated NLI judge, 409 contract, reasoned `why`, supersede chains, and the eval suites that calibrate the judge. The store is not the moat.

Where Ergo is overkill — use commodity RAG instead

Reference retrieval. Docs, runbooks, notes — anything you want found, not defended. Ergo routes this through ingest, which deliberately skips the gate and is ~30× faster.
High-volume, low-stakes memory. Chat history, scratch notes, transient status. A guarded write costs one normalizer call; paying that tax on content nobody will contradict is waste.
High-throughput multi-writer workloads. Ergo is single-writer SQLite by design.
Claims that churn constantly. If "the truth" changes hourly, supersede chains become noise. Ergo is for settled decisions with a shelf life.

The honest positioning

Ergo is not a universal memory firewall. It is selective safety rails for high-stakes, slow-changing decision domains — conventions, configs, policies, architecture choices, ops-runbook invariants — where a stale claim silently reversing a settled decision costs real money or real outages.

The moat is not the vector store (that part genuinely is a few lines). The moat is the judgment layer, the labeled evals that keep it honest, and single-file air-gap deployability.