The contradiction guard

Every guarded write (remember, learn, supersede) runs a four-stage pipeline against the active claims in scope before anything is stored. ingest skips it entirely — reference facts upsert freely.

incoming statement ─▶ 1. NORMALIZE ─▶ 2. STRUCTURAL ─▶ 3. NLI (gated) ─▶ 4. JUDGE ─▶ tier

The tier decides the outcome: clean (store), warn (store + surface a warning), or block (refuse with HTTP 409 and the conflicting claim).

Stage 1 — Normalize

Turns prose into a structured claim so the comparison isn't at the mercy of wording.

{
  "modality":     "should",        // must | should | may | must_not | should_not | may_not
  "subject":      "deploy",        // canonical entity / action
  "object":       null,            // canonical state verb (enabled/used/removed…), enum-constrained
  "value":        null,            // atomic value selector (colour/port/version…), free text
  "scope":        { "env": null, "team": null },
  "valid_from":   null,
  "valid_until":  null,
  "subject_kind": "PRESENT"        // PRESENT | FUZZY | MISSING
}

subject is the critical piece. If it's MISSING, the claim is treated as incomparable to anything that doesn't share the subject. If FUZZY, comparison defers to NLI.
value keeps the distinguisher out of subject (so subjects still match) and out of object (so the polarity enum stays clean). It's what lets "must use a blue canary" and "must use a red canary" be seen as a conflict instead of collapsing to the same norm.

Normalizers are pluggable (rule-based/deterministic, or an LLM backend). If the LLM returns something that isn't JSON — a refusal, chatter — extraction fails open to an unknown norm and the claim stores un-gated rather than erroring. A normalizer hiccup can never 500 a write.

Stage 2 — Structural compare

Pure, deterministic Python. Given the incoming norm and a candidate from the active set, it returns a verdict with a confidence tier (HIGH / MED / LOW):

if incoming.subject_kind == "MISSING":
    return "incomparable"                       # NLI never runs

if same_subject and overlapping_scope and overlapping_time:
    if opposing_modality:                       # must vs must_not, approved vs rejected …
        return "contradiction"                  # HIGH
    if same_direction_stance and value_a != value_b and both_atomic:
        return "contradiction (value)"          # HIGH — can't require BOTH
    if same_modality:
        return "consistent"                     # HIGH
    return "uncertain"                          # MED — defer

if subject_kind == "FUZZY":  return "defer to NLI"    # LOW
if disjoint_scope or disjoint_time:  return "coexist" # HIGH — principled, not an escape hatch
return "unknown"                                       # LOW — defer to NLI

Scope = environment + team + tenant. A prod claim and a dev claim aren't a contradiction even if the statements look opposing.
Time = valid_from / valid_until. Non-overlapping date windows → coexist.
Modality opposition is a fixed lexicon (must↔must_not, approved↔rejected, grant↔deny, …).
Value conflicts require both stances determined and equal, both values atomic (≤ 2 tokens), and the differing tokens grounded in the raw text — so a hallucinated value can't manufacture a contradiction, and config-enumeration docs don't false-flag.

Stage 3 — NLI (gated)

Natural-language inference runs only when structural returns LOW confidence (it couldn't decide). Even then the result is gated by subject/polarity/overlap checks before it's allowed to block — because raw NLI over-fires (>0.9) on unrelated cross-domain pairs. The gate is what turns a noisy classifier into a trustworthy blocker.

Stage 4 — Judge → tier

The judge combines the structural verdict and the (optional) gated NLI residual into a final tier:

tier	outcome
clean	store the claim, no conflict
warn	store, but return a warning (softer conflict / uncertain)
block	refuse the write — HTTP 409 with the conflicting claim(s) and their reasons

A blocked write is not an error to route around — it's the guard doing its job. Resolve it by cancelling, superseding with a new reason, or forcing an exception on the record.

The bet, validated

NLI alone on real prose barely beats a lexical baseline. Normalize → structural → gated-NLI → judge beats every piece alone — see the numbers.