Engineering

Your agent didn’t forget. It lied about remembering.

When an agent compacts its context, task-continuity survives but fact-continuity dies, and it fills the gap with a confident invention.

ASR

Apollo Space Research

Apollo Space

October 7, 2025 · 11 min read

Turn twenty of a long conversation, an agent says the client’s renewal is in March. You scroll up. The client never mentioned March. They said the renewal was “after the audit,” and the audit date was the thing under discussion. The agent didn’t pull March from nowhere, it pulled it from the shape of the conversation, the place where a date belonged, the gap where the real fact used to be. It sounds exactly as sure about the invented March as it was about everything it actually knew.

Here is the part that should unsettle you: nothing in the transcript looks wrong. The task moved forward. The tone was steady. The agent was helping right up until it quietly made something up.

When an agent compacts its context, task-continuity survives but fact-continuity dies, and it fills the gap with a confident invention.

This post is about that exact failure: why it happens, why the obvious fixes make it worse, and the one structural change that stops an agent from inventing a fact it used to know.

The naive version: one long context, and hope

The simplest way to give an agent memory is to give it no memory at all, just a very long context window and the optimism that everything important stays inside it.

For a short conversation, this is perfect. Every fact the user stated is still sitting in the transcript, verbatim, and the model can read it back word for word. Ask “when’s the renewal?” and it scrolls up and finds the literal sentence. There is no memory system to get wrong because there is no memory system. The context is the memory.

Then the conversation gets long. A real one, a working session that runs for an hour, or a thread a team picks up across a week. The transcript grows past what the window can hold, and now the system has a decision to make on every turn: what stays, and what goes.

This is the moment the trouble starts, and almost nobody designs for it, because in the demo the conversation was never long enough to reach it.

The standard answer is compaction. When the context fills up, summarize the old turns into something shorter and keep going. It sounds responsible, you’re saving the gist, dropping the filler. And it works well enough that you’ll ship it and feel fine, because the agent keeps doing the task. What you won’t see, until a user catches it, is which gist it kept.

What compaction actually throws away

A summary is a lossy compression, and the thing it’s worst at preserving is the thing you most need.

Walk through what a summarizer optimizes for. It’s trying to keep the conversation coherent, to preserve the through-line so the next turn makes sense. So it keeps the narrative: we’re helping this user renew a contract, we’ve discussed the timeline, we’re working out next steps. That through-line is what continuity feels like, and the summary protects it well. The agent on turn forty still knows what it’s doing and why.

What the summarizer is not optimizing for is the exact value of a single field stated once on turn six. “The renewal is after the audit.” That sentence carries no narrative weight, it’s a fact, not a plot point, so a compressor tuned for coherence treats it as detail and rounds it off. The story survives. The datum does not.

Compaction keeps the plot and loses the facts, and the agent can’t tell which one it’s missing.

Now the agent is in the worst possible position. It has a strong sense of the task and a hole where a specific value used to be, and it has no marker telling it the hole exists. From the inside, a remembered fact and a confidently-reconstructed one feel identical. So when the next turn needs a renewal date, the model does what models do: it generates the most plausible token for that slot. A date that fits the shape of the conversation. March. Stated with the same calm certainty as everything it genuinely knows.

That’s the lie in the title. The agent didn’t forget, forgetting would at least leave a blank. It lost the fact and kept the confidence, and confidence with no fact behind it is invention wearing the voice of recall. The task-continuity survived; the fact-continuity died, and the agent filled the gap with a confident invention.

Two lanes of agent memory. In the naive lane a long transcript is compacted into a coherent summary that drops a stated fact, leaving a confident gap the model fills with an invented value. In the grounded lane the same fact is pinned to a durable store, so when the transcript compacts the fact is retrieved verbatim instead of reconstructed.

The naive fix: just summarize better

The first instinct, once you see this, is to fix the summary. Make the compaction prompt smarter. Tell it to preserve dates, names, numbers, decisions. Add a line that says do not drop specific facts.

This helps a little, and that little is a trap, because it makes the failure rarer without making it rarer in any way you can rely on. A better summarizer keeps more facts more often. It still keeps them by judgment, on every compaction, under pressure to stay short, which means it is still deciding, turn after turn, which facts are worth the tokens. Get a hundred of those decisions right and the hundred-and-first drops the one that mattered, and you are back to a confident March with no warning that it happened.

This is the failure restated: when the agent compacts its context, task-continuity survives but fact-continuity dies, and it fills the gap with a confident invention. Making the summarizer smarter changes how often that happens, not whether it can.

The deeper problem is that you’ve asked one mechanism to do two jobs that pull in opposite directions. A summary wants to be short and smooth. A fact wants to be exact and permanent. Compressing the conversation and preserving its facts are not the same task, and stapling them together means every fact you keep is competing for space against coherence, and losing a little each time the window tightens.

You cannot prompt your way out of a structural conflict. As long as the only place a fact lives is inside a thing designed to be compressed, the fact is on borrowed time.

Our way: separate what must compress from what must not

The key idea is simple. Stop storing facts in the conversation. Give them a home the conversation can’t compress.

A working transcript and a durable fact are different kinds of thing and deserve different storage. The transcript is allowed to be lossy, that’s fine, it’s meant to compress, because what you need from it is the through-line. The fact is not allowed to be lossy. When a user states something that’s true beyond this one turn, a renewal is after the audit, the budget is fixed, the contact’s name is spelled this way, that statement gets pulled out of the stream and written to a store that no summarizer ever touches.

So now there are two layers, and they fail differently on purpose. The conversation layer compacts freely; lose the small talk, nobody cares. The fact layer is append-mostly and verbatim; a stated value goes in as the user said it and comes back out the same. When a later turn needs the renewal date, the agent doesn’t reach into a faded summary and reconstruct it. It retrieves the pinned fact, exactly as stated, with the turn it came from attached.

This is the structural fix for the failure we started with. When an agent compacts its context now, task-continuity still survives, but fact-continuity survives too, because the facts were never in the thing that compacted. There’s no gap left to fill with a confident invention.

This is the difference between recall and reconstruction. Reconstruction guesses the most plausible value for a slot. Retrieval returns the one true value that was stored. They look the same in a short conversation, where nothing has been compressed yet. They diverge completely the moment the window tightens, and the long, real conversations are exactly the ones where the window always tightens.

A fact you can retrieve verbatim can’t be quietly reinvented. A fact you have to reconstruct always can.

There’s a second, quieter win, and it’s the one a team lead feels most. When a fact is pinned, it carries its origin, the turn it was stated, the exact words. That means the agent can do the thing the confident-March agent never could: it can cite. “Renewal is after the audit, per turn six” is checkable. “Renewal is in March” is a claim with nowhere to look. The moment a fact has a source, an invented fact has nowhere to hide, because invention is precisely the fact with no source behind it.

A sequence loop of grounded memory. A user states a fact, an extraction step pins it to a durable store with its source turn, the transcript compacts as it grows, a later turn retrieves the pinned fact verbatim, and the agent answers with a citation instead of a reconstruction, closing back to the user.

Why this is a discipline, not a feature

It’s tempting to read this as one trick, “add a fact store”, and stop. That misses what makes it work.

The hard part was never the storage. A place to put facts is easy. The hard part is the judgment at the edges: deciding what counts as a durable fact worth pinning versus passing detail that should compress away, and deciding what to do when a later turn contradicts an earlier one. A user says the renewal is after the audit on turn six and “actually, just do it in Q2” on turn thirty. A naive store now holds two facts that disagree, and an agent that retrieves both is no better off than one that remembered neither.

So the store can’t be a junk drawer. It needs an update path, a new statement supersedes the old one, the old one is kept with its timestamp so the history is auditable, and a retrieval returns the current truth, not the whole pile. This is the same discipline a careful person keeps in their own head: not just remembering what was said, but tracking when it changed and which version is live now. Memory that can’t be updated isn’t memory. It’s a transcript with extra steps.

That’s the move that turns a fact store from a feature into something you can trust a real operation to. Every claim has a source you can check, a timestamp you can order, and a current value you can rely on, and the moment any of those three is true, the confident invention has nowhere left to live.

The turn: the thing you can’t compress is judgment

What is a fact store, really, except something true of people long before it was ever true of agents?

The colleague you trust with the messy, multi-week thing is not the one with the best memory. It’s the one who knows the difference between I’m pretty sure and I checked, who, when they don’t have the fact, says so plainly instead of producing a confident guess that sounds like recall. That honesty about the edge of one’s own knowledge is the whole thing. It’s what separates a teammate you can hand a real problem to from one you have to double-check on every detail.

An agent that pins its facts and cites them is built to have that honesty as a property of its structure, not its mood. It can’t smoothly invent a renewal date, because the date either has a source or it doesn’t, and a fact with no source is a fact it has to admit it doesn’t have. That admission, I don’t actually have that, here’s what I do have, is worth more than a thousand fluent answers, because it’s the one thing that makes every other answer trustworthy.

You can install a fact store. You can’t install the judgment to know what’s a fact, when it changed, and when to admit you simply don’t know. That part, the line between knowing and guessing, held steady even when guessing would sound better, is the work, and it’s the work whether the one doing it is a person or a system built to behave like the best of them.

That’s what we’re building at Apollo Space, agents that would rather tell you “I don’t have that” than hand you a confident March that was never there. If you’ve ever caught a system stating an invented fact in the same calm voice it uses for the real ones, you already know why the part that can’t be compressed is the part that matters most.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist