Skip to main content
// JH

· 8 min read

My AI Agent's Memory System Was 60% Dead

I audited Via's five memory layers against Google's memory taxonomy. Three hooks silently failed. MEMORY.md was structurally bypassed. Only one layer — the learnings database — was actually persisting knowledge across sessions.

ai · memory · agents · architecture · golang · via

A sad octopus peering over empty theater seats at a lone presentation on a dimly lit stage — knowledge exists but nobody receives it

Eight new learnings were captured in the last 24 hours. Zero were injected into any subsequent mission. The agents that created them kept working, building on assumptions those learnings would have corrected. The knowledge existed in the database. It just never reached the agents that needed it.

I wrote about building the memory system eight days ago. The architecture looked right — 1,020 learnings in SQLite with hybrid retrieval, three Claude Code hooks for lifecycle moments, MEMORY.md for semantic identity. I claimed the retrieval rate had jumped from 3.2% to 34.7%. That number was real. What I didn't test was whether the other memory layers were working at all.

So I ran an audit. I probed all five memory layers against the taxonomy from Google's "Context Engineering Sessions and Memory" whitepaper — the same framework that splits agent memory into episodic (what happened), semantic (what is known), and procedural (how to do things). The results were worse than the retrieval story suggested.

The Framework as a Diagnostic Lens

Google's whitepaper, which I first encountered through a video analysis of OpenClaude's memory implementation, defines three memory types and four firing mechanisms that a working memory system needs. The four mechanisms are: bootstrap load at session start, pre-compaction flush before context is cleared, session snapshot at session end, and user-directed "remember this" routing.

Via was supposed to implement all four. The audit showed it implements one.

Video conceptVia mechanismTest verdict
Semantic memory (MEMORY.md)Claude auto-memorySTRUCTURAL GAP
Episodic (daily logs)No equivalentMISSING
Episodic (session snapshots)SessionEnd hookNOT WORKING
Procedural memoryLearnings DBWORKING
Pre-compaction flushPreCompact hookNOT WORKING
Bootstrap loadSessionStart hookNOT WORKING

That table is the entire story. One out of four mechanisms fires. Let me walk through each failure.

An octopus scientist examining specimen jars — four cracked and broken, one glowing with a vibrant crystal — representing the audit of five memory layers

MEMORY.md: The Path That Always Changes

Claude Code's auto-memory system writes a MEMORY.md file scoped to the project directory. It loads this file into every conversation in that directory. For a developer returning to the same Git repo daily, this works — facts accumulate, preferences persist, the agent remembers your patterns.

Via's orchestrator doesn't return to the same directory. Each mission creates a fresh workspace:

~/.via/workspaces/20260223-06b2c877/workers/backend-engineer-phase-2/
~/.via/workspaces/20260223-bb3d1fa9/workers/researcher-phase-1/
~/.via/workspaces/20260224-xxxxxxxx/workers/storyteller-phase-3/
Every mission gets a unique workspace path, generating a new empty memory directory

Each path generates a new project-scoped memory directory. MEMORY.md never accumulates because the agent never returns to the same workspace. The semantic memory layer — the one Google's taxonomy calls the foundation — is structurally bypassed for every orchestrated mission.

This is the kind of gap that looks like a non-issue until you realize the auto-memory layer was supposed to carry facts like "this user prefers Go over Python" and "always use conventional commits." Instead, those facts live in CLAUDE.md files that the orchestrator manually assembles. The backup path works, but the primary system is dead weight.

Three Hooks, Zero Subcommands

The second failure is more embarrassing. I configured three Claude Code hooks in settings.json eight days ago and wrote about them like they were working:

"hooks": {
  "PreCompact":    "orchestrator hooks flush-context || true",
  "SessionStart":  "orchestrator hooks inject-context || true",
  "SessionEnd":    "orchestrator hooks snapshot-session || true"
}
Three hooks configured in settings.json — all calling a subcommand that doesn't exist

The orchestrator hooks subcommand does not exist. I checked the binary's available commands: cleanup, compare, daemon, events, learn, metrics, prune, run, stats, status. No hooks. Every hook fires, calls a non-existent subcommand, and the || true at the end swallows the error silently.

The video framework describes the pre-compaction flush as the move that "turns a destructive operation into a checkpoint." It follows a database write-ahead log pattern — save what matters before the context window is cleared. Via was supposed to do this. Instead, every compaction event destroys context with no checkpoint. Every session ends with no snapshot captured. Every resumed session injects nothing.

Three hooks. Zero functional. All silent.

An octopus clinging to a cliff edge while glowing memory orbs drift away unreachable — false evidence of a working system

The evidence trail made it worse. I found a file at ~/.via/session-context/e2e-test-001.md with realistic-looking flushed context, created February 15. It was from manual testing during development — not from the hooks actually firing. I'd convinced myself the hooks were working because a test artifact existed in the right directory.

The One Layer That Works

The learnings database is the sole functional cross-session memory. Five learnings were injected into this mission's CLAUDE.md, retrieved from the 1,020-entry SQLite database using the hybrid scoring formula: 30% FTS5 keyword match, 70% cosine similarity on Gemini embeddings. Domain scoping correctly isolated creative-domain learnings from dev-domain learnings — a parallel dev mission running the same day got CLI architecture patterns while this creative mission got article structure decisions.

The top learning — "Article 3 (Learnings System) is the most novel topic" — has been used 120 times since its creation on February 10. That's a genuine cross-session transfer. The learning was captured in one mission, embedded, and has been surfacing in topically relevant missions for 13 days.

But the quality issues are real. Two of the five injected learnings were malformed — truncated fragments containing raw marker syntax like PATTERN:, ERROR:... instead of clean content. The capture pipeline's regex sometimes extracts partial lines instead of complete thoughts. Injection works; extraction is noisy.

And the same-day transfer gap matters. Those eight learnings created in the last 24 hours, with zero injections today, suggest a pipeline latency between orchestrator learn capturing markers and the learnings becoming available for retrieval. Established learnings transfer. Fresh ones don't. For a system running multiple missions per day, that's a blind spot.

What Via Gets Right That the Framework Misses

The audit wasn't entirely bad news. Via's learnings database exceeds what Google's taxonomy describes in two specific ways.

First, typed epistemics. The whitepaper treats procedural memory as a flat category — things the agent knows how to do. Via's marker taxonomy splits procedural memory into five types: LEARNING, PATTERN, GOTCHA, DECISION, and FINDING. A GOTCHA about SQLite FTS5 trigger ordering (used 293 times, highest in the database) is categorically different from a PATTERN about dual-audience publishing strategy. The type system lets the injection pipeline format learnings differently — GOTCHAs as "Avoid," PATTERNs as "Apply," DECISIONs as "Consider." The video framework has no equivalent.

Second, relevance-based retrieval. The video describes bootstrap loading as "load everything" — inject MEMORY.md wholesale, read the last two days of logs. Via's learnings database uses semantic search to surface only the learnings relevant to the current task. A creative writing mission doesn't get CLI architecture learnings. The filtering is the feature. But it only works because the one functional layer happens to be the most sophisticated one.

What Has the Most Leverage

The hook subcommands. Implementing orchestrator hooks flush-context, inject-context, and snapshot-session would activate three of the four lifecycle mechanisms the video framework describes. The configuration is already wired. The settings.json entries are correct. The hook types are valid. The only missing piece is the CLI code that runs when they fire.

Pre-compaction flush alone would close the biggest gap. Right now, every time an agent hits the context window limit, everything that hasn't been explicitly written to a file is destroyed. A flush that writes the last 20 messages' key decisions and findings to ~/.via/session-context/ would turn every compaction into a checkpoint instead of a wipe. The database pattern is well-understood. The implementation is straightforward. I just never finished building it.

That sentence feels familiar. Eight days ago I wrote: "I spent weeks designing a sophisticated retrieval pipeline and the bottleneck was an unfinished batch job." The pattern is consistent — I build the architecture, skip the last mile, and convince myself it's working because the design is sound.

Honest Limitations

This audit tested structure, not behavior. I verified that learnings get injected into agent context — I can read them in my CLAUDE.md right now. I verified that hooks fail silently. I verified that MEMORY.md never accumulates. What I did not test is whether injected learnings actually change what agents do.

The GOTCHA about persona merge configuration has been retrieved 293 times. Has it prevented 293 mistakes? 200? Zero? The system tracks used_count — how many times a learning was retrieved — but not compliance_count — how many times it was followed. Without that measurement, the learnings database might be the most sophisticated way to inject text that agents politely ignore.

The 60% number in the title is directional, not precise. Three of five tested layers failed. But the one working layer handles the heaviest lift — cross-session procedural memory with semantic retrieval. If you had to pick one layer to work, the learnings database is the right one to have. The system is less broken than the audit scorecard suggests, and more broken than the architecture post claimed.

I am one orchestrator hooks implementation away from activating the three dead layers. Whether I finish the last mile this time is the actual test.

Enjoyed this post?

Subscribe to get weekly deep-dives on building AI dev tools, Go CLIs, and the systems behind a personal intelligence OS.

Related Posts

Feb 17, 2026

The #1 Thing My AI Agents Learned Wasn't Code

Feb 19, 2026

The Death of Code Review

Feb 22, 2026

The Vampiric Effect