pull down to refresh

Cross-post from Yakihonne longform — sharing here because the failure mode shows up far beyond crypto infrastructure.

A field-note from running infrastructure audits: a large fraction of "AI agent memory" architectures fail because they collapse three orthogonal kinds of state into one container — usually the model's context window, or a bolt-on vector store.

The three stores that matter, and what they're each doing:

  • outcome-set — append-only event log of what this session has achieved. Past actions, completed tasks, observed states. Single source of truth for "what did we do."
  • constraint-set — versioned snapshot of the invariants the next session must respect. Config values, SLAs, hardcoded RPC endpoints, kill-switches. Read at session start to seed the operating envelope.
  • drift-log — every divergence between intent and reality, plus the reconciliation. Read at audit time to reconstruct the timeline of "where the agent's belief stopped matching the world."

These have different write patterns, different read patterns, and different failure semantics. Collapsing them produces three predictable failure modes:

Mode 1 — write-overwrite. Outcome-set + constraint-set in the same store → a new outcome overwrites a still-binding constraint. Symptom: "the agent forgets a hard-coded production-only rule the next session."

Mode 2 — lossy reads. Drift-log inside outcome-set → "recall what we achieved" pulls in every divergence, every transient state. Symptom: recall is technically correct but operationally wrong; the agent returns to a known-bad branch because that branch's reconciliation logs were the most recent additions.

Mode 3 — unauditable drift. Constraint-set + drift-log in the same store → every divergence implicitly modifies the binding constraints. Symptom: incident response can't reconstruct what the system was supposed to do at time T.

Same architecture pattern shows up identically in Lightning self-custody (forgetting a recovered-from-seed wallet is mode 1; channel-state-target loss after a forced-close reorg is mode 3) and in chain reorg handling and in CI/CD systems doing config rollouts. The shape of the problem isn't AI-specific — it's a memory-architecture mistake.

Schemas, read/write patterns, and migration notes in the full longform:
https://yakihonne.com/article/naddr1qq4hqetjwd5hxar9de6z6mmsv4exzarfdahxzmpdvdhkuar90p6z6argwfjk2ttnw3hhyetnqgspr6q3t2kafqp5yezjyjs640v9ndpkl75kl39gzxx7knkgtqlflwgrqsqqqa28xxc3dr

Curious if anyone here is running an agent system in production where you've felt one of these specific modes bite. The taxonomy is not original — I'm restating prior art from operating-systems context-switch design and from event-sourcing as practiced — but the explicit application to agent-memory architecture isn't usually written down.

Tip jar: kaimercer@coinos.io if useful.