Engineering

The backfill that ran inside a request and killed the app mid-demo

The slowest thing your agent does should never hold the connection the user is waiting on.

ASR

Apollo Space Research

Apollo Space

· 11 min read

Someone clicks a button to refresh a dashboard. Behind that one click, an agent decides the underlying data is stale and quietly starts re-reading three years of history to fix it. The click is supposed to take a moment. The re-read takes minutes. And for those minutes, the click is still open, the browser spinner turning, the connection held hostage by a job that has nothing to do with the thing the person actually asked for.

Now imagine that click happened on a screen someone was sharing with a room full of people. The button never came back. The app looked dead. It wasn’t dead, it was busy doing something enormous that nobody asked to wait for.

That is the failure mode this post is about, and it is one of the most common ways an otherwise-good agent system falls over in front of a real user.

The slowest thing your agent does should never hold the connection the user is waiting on.

The shape of the bug: a small ask, a giant side effect

Let’s describe the trap before the fix, because the trap is what makes the fix obvious.

An agent gets a request that looks small. Show me this account’s summary. To answer it well, the agent notices something true and reasonable: the summary depends on data that hasn’t been refreshed in a while. A diligent agent doesn’t serve stale numbers. So it does the responsible thing, it starts bringing the data up to date. It backfills.

The naive way to wire this is the way every framework tutorial wires it: do the work right there, inside the request, in the same call that’s holding the user’s connection. The agent thinks, the agent acts, the agent replies, one straight line from click to answer. It reads beautifully in a diagram. It demos perfectly on a tiny dataset.

Then a real account arrives with real history, and the “small ask” drags a giant side effect behind it. The summary needed thirty seconds of fresh data; the backfill it triggered needs to walk three years of records. The user asked one question. The system, trying to be thorough, decided to do an hour of homework first, and it decided to do all of it before saying a single word back.

The connection a person is waiting on is the most time-sensitive resource in the whole system, and we’d handed it to the slowest job in the building.

A small user request enters one straight line through the agent and triggers a giant multi-minute backfill inside the same call, so the user's connection stays open and spinning until the heavy job finishes.

The cruelty of it is that nothing was wrong, exactly. The agent’s judgment was sound, stale data is bad, refreshing it is good. The backfill was correct. The summary was correct. The only mistake was where the heavy work ran: in the foreground, on the user’s clock, inside the request the user was waiting on. Right work, wrong thread.

The naive fix that feels like a fix and isn’t

The first instinct, once you’ve felt this pain, is to make the slow thing faster.

So the team optimizes the backfill. They add an index, they batch the reads, they cut the three-year walk down to ninety seconds. And ninety seconds is genuinely better than five minutes, on the demo account. Everyone exhales. The button comes back before anyone in the room gets nervous.

But speed is not the property that was broken. Coupling was the property that was broken.

A ninety-second job holding a user’s connection is still a ninety-second job holding a user’s connection. You haven’t removed the hostage situation; you’ve shortened it. And the moment a slightly bigger account shows up, one with five years instead of three, or one where the upstream source is having a slow afternoon, ninety seconds becomes four minutes again, and the app dies in front of the next person. You optimized the symptom. The disease is that a foreground request can be held open by a background-sized job at all.

Making the slow thing faster doesn’t fix it. It just raises the size of the account that breaks you.

The slowest thing your agent does should never hold the connection the user is waiting on, and “should never” doesn’t mean “should be fast.” It means the connection and the heavy work should not be on the same thread in the first place. No amount of optimizing the backfill changes the fact that it’s standing where it must not stand.

The fix: answer now, finish later

Here is the idea, and it’s old enough to be boring, which is exactly why it’s trustworthy.

Split the request into two promises. The first promise is to the user, and it’s fast: here is the best answer I can give you right now, immediately, and here’s a flag if part of it is still freshening. The second promise is to the data, and it’s patient: I have scheduled the heavy refresh, it is running somewhere that doesn’t touch your connection, and when it finishes the numbers will be right. The user’s request returns in a moment. The backfill runs on its own time, in the background, on a worker built to do slow things slowly.

The naive line was click → think → do the giant job → reply. The fixed line is click → reply with what we have → and separately, off to the side, do the giant job. The reply no longer waits on the backfill, because the reply and the backfill are no longer the same thing happening on the same thread.

The same request splits into two paths: a fast lane returns an immediate answer to the user while a background worker picks up the heavy backfill on its own clock and reports back when the data is fresh.

The piece that makes this honest rather than a dodge is the flag. We don’t pretend the slow data is fresh when it isn’t. The fast answer tells the truth: this part is current as of an hour ago, and a refresh is running now. The user gets something useful instantly and gets told plainly that one corner is still updating. That’s the difference between answering fast and lying fast. A fast answer that hides its own staleness is worse than a slow one; a fast answer that names exactly what’s still cooking is better than both.

What you’ve done, mechanically, is move the heavy job off the critical path. The critical path, the line between a click and a response, should carry only the work that has to happen before you can speak. Everything else, every backfill and re-index and bulk re-read, belongs on a different clock. This is the single most useful discipline we know for keeping an agent app responsive under real load, and it has almost nothing to do with how smart the agent is.

Why agents make this trap worse, not better

You might think this is just old web-architecture advice wearing a new hat, and the bones of it are. But agents make the trap easier to fall into, for a reason worth naming.

A traditional program does what you wrote. If you didn’t write “go re-read three years of records,” it doesn’t. An agent does what it judges is needed, and good judgment, in an agent, often means deciding to do more than the literal ask. That’s the whole point of an agent: you wanted the right answer, not a literal one, so it notices the stale data and fixes it without being told. The very quality that makes the agent useful is the quality that makes it reach for the giant side effect.

So the agent that’s worth having is exactly the agent most likely to trigger a backfill you never explicitly requested. Which means you cannot rely on “we just won’t do slow things in the request.” The agent will decide to do a slow thing, on its own, because doing it is correct. Your architecture has to assume that, has to treat any agent action as something that might turn out to be enormous, and route it accordingly.

That’s why the fix can’t be a rule the agent follows. It has to be a property of the system around the agent. The agent is allowed to decide the data needs refreshing. The system is what guarantees that deciding so doesn’t hold a single user’s connection open while it happens. We built it so the agent can be as ambitious as it wants about correctness, and the runtime quietly makes sure ambition runs in the background. The intelligence proposes the heavy work; the plumbing protects the user from waiting on it.

The slowest thing your agent does should never hold the connection the user is waiting on, and with an agent, you don’t get to know in advance which thing will turn out to be the slowest. So you protect the connection from all of them.

Where the line actually goes

The practical version of all this is one question you ask of every action in a request: does the user have to wait for this in order to get a useful reply?

If the answer is yes, the reply genuinely can’t be formed without it, it stays on the critical path, and you make it as fast as it needs to be. If the answer is no, the reply is useful without it, the action just makes things better or fresher or more complete, it goes to the background, full stop, no matter how clearly correct it is to do. “Correct to do” and “must happen before we answer” are two different questions, and the bug at the top of this post came entirely from treating them as one.

Draw that line once, honestly, and most of the responsiveness problems in an agent app simply stop happening. Not because anything got faster, but because the only things left on the user’s clock are the things the user is actually waiting for. The backfill that walks three years of history is still slow. It’s just slow somewhere that no one is watching a spinner.

The turn: responsiveness is a promise you keep to a person

There’s a person on the other end of every one of these requests, and what they’re really asking for isn’t data. It’s an answer that comes back when they expect it to.

The reason the dead button in a shared screen stings so much isn’t the lost minutes. It’s that the app made an implicit promise, click me and I’ll respond, and then broke it, silently, in front of an audience, for the most defensible reason in the world: it was busy being thorough. Users don’t forgive thoroughness they can’t see. They experience a held connection as a broken one, every time, no matter how good the work happening behind the spinner is. A system that’s slow when you’re watching it doesn’t feel diligent. It feels gone.

So the discipline isn’t really about threads and workers. It’s about respecting the difference between the work a person is waiting on and the work that can happen while they get on with their day. Keep those two apart and the app feels alive even while something enormous is grinding away underneath it. Blur them together and the best agent in the world will still, one afternoon, freeze in front of exactly the person you most wanted to impress.

The fast answer and the patient backfill are both correct. The whole craft is refusing to make the person wait on both.


That’s what we’re building at Apollo Space, an agent runtime where the system answers you the instant it can and does its heavy lifting off to the side, so ambition never costs you a spinner. If you’ve ever watched a perfectly good app go dark in front of a room because it was quietly doing too much in the wrong place, you already know why we put that line where we put it.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist