Engineering

Your vector index is quietly going stale

Embeddings drift, content changes, the model updates, and the index keeps confidently answering with yesterday. Re-embedding is the maintenance job nobody put on the calendar.

ASR

Apollo Space Research

Apollo Space

· 11 min read

A pricing page changed on Monday. The number in the contract went up. By Wednesday, an agent answered a customer with the old price, confidently, in a complete sentence, with a citation, and nobody could see anything wrong. The retrieval worked. The model worked. The answer was wrong because the index was repeating a document that no longer exists in that form.

Nothing broke. That’s the whole problem.

Most teams treat their vector index the way they treat a database table they wrote once and never touched again. It sits there, fast and quiet, returning the nearest neighbors to whatever you ask. And every day it drifts a little further from the truth it was built to hold. A vector index is not a snapshot you take once, it is a living copy of your knowledge that decays the moment the world it copied moves on. This post is about the three ways it decays, why none of them throws an error, and what it takes to keep an agent’s memory honest.

The naive mental model: embed once, search forever

The first time you build retrieval, the loop feels finished the day it works. You chunk your documents, run each chunk through an embedding model, store the vectors, and search them. A question comes in, you embed it, you find the closest chunks, you hand them to the model. The demo is clean. The answers are grounded. You move on.

The hidden assumption in that loop is that the index is a photograph: take it once, and it holds. But you didn’t photograph your knowledge. You translated it, into a particular coordinate space, with a particular model, on a particular day. And translations go stale three different ways, none of which announces itself.

The first is the obvious one and the one teams still miss: the content changed and the index didn’t. A document gets edited, a policy updates, a contract is re-signed, a page is deleted. The source of truth moves. The vectors don’t. Your index now holds a faithful copy of a document that no longer says what it says.

The second is subtler. The query and the documents stop speaking the same language. Embeddings are only comparable if they came from the same model. The day you upgrade your embedding model, for better quality, lower cost, longer context, your new query vectors live in a different coordinate space than your old document vectors. Distances between them become noise. Search still returns something, ranked, plausible-looking. It’s just measuring nearness in a space where nearness no longer means anything.

The third is the quietest of all. The meaning drifted even though the words didn’t. “The deal” meant a prospect in March and a signed customer in June. “Churned” got redefined. A product was renamed. The chunk still embeds fine, but the concept it points at has moved underneath it, and retrieval keeps surfacing it for the old meaning.

Three kinds of staleness. Zero exceptions thrown. The index is happy to return yesterday forever.

Three ways a vector index goes stale, all silent: the source document is edited but the stored vector is not re-embedded; the embedding model is upgraded so new query vectors and old document vectors no longer live in the same space; and a concept is renamed so the words match but the meaning drifted. Every path returns a confident, plausible, wrong answer with no error raised.

Why staleness is worse than a crash

Here is the uncomfortable thing about a stale index: a crash would be a mercy.

When a system crashes, you know. There’s a stack trace, a red dashboard, a page at 4am. The failure is loud, it’s located, and someone fixes it. A stale index does the opposite. It degrades into confident wrongness, which is the single most expensive failure mode an agent can have, because it spends your trust to do it.

Consider the cost in three layers. The first is a wrong answer, annoying, recoverable, usually caught. The second is a wrong answer delivered with a citation, which is much worse, because the citation makes a human stop checking. The third, and the one that actually hurts, is a wrong answer delivered with a citation by an agent that then acts on it, sends the email, quotes the price, files the task, books the meeting against a date that already passed. The staleness didn’t stay in the answer. It became a decision.

A confident answer from a stale index is not a bug report. It’s a successful operation returning the wrong result.

And the reason this slips past every test you wrote is that your tests check the mechanism, not the freshness. Does retrieval return k chunks? Yes. Are they ranked by similarity? Yes. Does the model ground its answer in them? Yes. Every check is green. None of them asks the only question that matters: are these chunks still true? You can have a hundred passing tests and a perfectly broken memory, because the thing that rotted was never the code. It was the gap between when you embedded and when the world moved.

The bottleneck never disappears. It just moves from “is the retrieval correct” to “is the retrieval current”, and the second question has no error code.

The naive fix, and why it fails

So you schedule a re-embed. Once a night, re-embed everything, rebuild the index, swap it in. Problem solved.

It isn’t, for two reasons that show up fast.

The first is cost and time. Re-embedding everything, every night, means paying to translate documents that didn’t change, which, on most days, is nearly all of them. As your knowledge grows, the nightly rebuild grows with it, until the “once a night” job doesn’t finish before morning. You’re now burning compute to re-derive vectors that were already correct, and the freshness window is still a full day wide. A price that changed at 9am waits until 2am to enter the index. The customer asked at 11.

The second is worse: a full rebuild only fixes the first kind of staleness. Re-embedding edited content with the same old model leaves the model-upgrade problem and the meaning-drift problem completely untouched. You did the expensive thing and only solved a third of the decay.

The naive fix treats freshness as a batch job. But freshness isn’t a batch property. It’s an event property. A document didn’t go stale on a schedule, it went stale the instant someone edited it. The right trigger for re-embedding is not the clock. It’s the change.

This is the same lesson reactive systems learned a generation ago. You don’t poll the world on a timer asking “did anything happen?” You let the world tell you when something happened, and you respond to that. An index that re-embeds on the clock is polling. An index that re-embeds on the change is listening.

Our way: treat the index as a derived view that knows when it’s dirty

The key idea is simple. The vector index is not a store. It is a derived view of a source of truth, and a derived view’s only job is to know when it’s out of date.

Say that differently, because it’s the load-bearing reframe. Your documents are the truth. The vectors are a computed projection of that truth into a searchable space. The moment you treat the projection as the truth, as a thing you write once and trust forever, you’ve lost the ability to know when it’s wrong. The moment you treat it as derived, three obligations fall out, one for each kind of staleness.

Obligation one: re-embed on the change, not the clock. When a source document is created, edited, or deleted, that event re-embeds exactly that document’s chunks and updates exactly those vectors. Nothing else moves. The freshness window collapses from a day to seconds, and the cost collapses to the size of the change instead of the size of the corpus. The price that changed at 9am is searchable at 9:01, and you paid for one document, not ten thousand.

Obligation two: version the embedding space, and never mix versions. Every vector carries the identity of the model that made it. A query embedded by the new model is only ever compared against documents embedded by the same model. When you upgrade, you don’t flip a switch and pray, you backfill the corpus into the new space in the background, run both spaces in parallel, and cut over only when the new space is complete. No query ever measures distance across two coordinate systems, because that distance is meaningless and the system knows it.

Obligation three: let meaning-drift surface as a contradiction, not a silent answer. This is the hardest one, and it’s where an index stops being a lookup and starts being a memory. When the same concept is described two incompatible ways across time, a deal that’s both “prospect” and “closed,” a price that’s both old and new, the system shouldn’t quietly return the higher-ranked one. It should notice the conflict and resolve it toward the most recent, most authoritative source, the same way a careful person says “wait, didn’t that change?” A good memory isn’t the one that retrieves fastest. It’s the one that catches itself repeating something that’s no longer true.

Two ways to keep a vector index fresh. On the left, a nightly batch re-embeds the whole corpus on a timer, finishes hours late, costs the size of everything, and still only fixes edited content. On the right, a change event re-embeds just the touched document in seconds, the embedding space is versioned so queries never cross models, and a drift check flags contradictions before they reach the answer.

The maintenance job nobody scheduled

Step back and notice what these three obligations have in common. None of them is a smarter model. None is a better distance metric. All three are maintenance, the unglamorous, never-finished work of keeping a copy honest as the original keeps moving.

That’s the real shape of the problem, and the reason it’s so widely missed. Retrieval is presented as a build-once capability: you “added RAG,” you “gave the agent memory,” past tense, done. But memory was never a thing you add. It’s a thing you maintain. The day you stop maintaining it is the day it starts lying to you in complete sentences, and it will keep doing so, confidently and without complaint, for as long as you let it.

The teams that get burned aren’t the ones who built retrieval badly. They built it well, shipped it, and walked away, and the index did exactly what an unmaintained index does. It froze the world on the day it was built and answered every future question from that frozen world. A vector index is not a snapshot you take once. It is a living copy of your knowledge that decays the moment the world it copied moves on. The work isn’t building it. The work is never being done.

The turn: a memory you can trust is a maintained one

Here is the part that isn’t about embeddings.

When you put a person in charge of company knowledge, the operations lead who knows where everything is, the support veteran who remembers every edge case, what you’re really trusting is not their recall. It’s their freshness. They know the price changed on Monday. They know that customer churned. They know “the deal” means something different now than it did in spring. Their value was never that they remembered a lot. It was that they remembered currently, and corrected themselves out loud when they caught a stale fact leaving their mouth.

That’s the bar an agent’s memory has to clear before you can let it act on its own. Not “can it retrieve,” but “does it know when what it retrieved went out of date.” A coworker you trust is one who keeps their own knowledge current without being told to, who re-reads the document when it changes, who notices when two things you said can’t both be true, who never confidently quotes a number from a page that was edited yesterday. The whole question of whether software can be a teammate comes down to whether its memory can be trusted, and a memory can only be trusted if it’s maintained.


This is what we’re building at Apollo: not a search box you fill once and forget, but a company brain that treats its own knowledge as a living thing, re-reading what changed, versioning how it understands, and catching itself before it repeats yesterday as today. The index that quietly goes stale is the one nobody put on the calendar. The memory you can trust is the one that never stops checking whether it’s still right.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist