Engineering

"Is it working?" Has a different answer at every layer of the stack

A feature can be true in the database, true in the API, and a lie on the screen, answer per layer.

ASR

Apollo Space Research

Apollo Space

March 18, 2026 · 10 min read

Someone asks us a four-word question that sounds like it should have a four-word answer: “Is the feature working?” We open the database, run a query, and there it is, the row exists, the data is shaped exactly right, the relationship to the user is correct. We almost say yes. Then we open the actual product the way a customer would, click to where that feature is supposed to live, and find nothing. No button. No screen. The thing is unmistakably real one layer down and unmistakably absent where a human would reach for it.

Both observations are true at the same time. That contradiction is the whole subject of this post.

A feature can be true in the database, true in the API, and a lie on the screen. The honest answer to “is it working” is never one word, it’s one answer per layer.

We learned this the way most uncomfortable lessons get learned: by confidently saying yes and being wrong in front of the person who asked.

The four-word question with a four-word trap

The question feels binary because most questions in the rest of life are. Is the light on? Is the door locked? Yes or no, one place to look.

Software isn’t one place. A single feature, say, “a user can create an agent”, is not a thing. It’s a stack of things that each have to be true, sitting on top of each other: a business requirement, a backend rule that enforces it, an API route that exposes it, a database table that stores it, the logic that makes it behave, the tests that prove the behavior, and finally the screen where a human actually does it. Seven floors, give or take. “Is it working?” is really seven questions wearing a trench coat.

Here’s the trap. Any one of those floors can be solid while the floor above it is missing, and from inside that floor everything looks finished. You can stand in the database, see the data, and have every reason to believe the building is done, while the front door doesn’t exist yet. The view from each layer is honest and incomplete in exactly the same breath.

So the naive answer, checking the layer you happen to be standing on and reporting that as the answer for the whole thing, isn’t lazy. It’s the most natural mistake in the craft. You looked. You saw truth. You reported truth. You were still wrong, because the question wasn’t about your floor.

The naive version: check the layer you can see

Let’s stage the dumb version, because we’ve shipped it, and the felt pain is the whole point.

You build a feature back-to-front, the way most features get built. The data model goes in first, it’s the foundation, it’s satisfying, you can see rows appear. You write the rule that governs it. You add the route. At each step you verify the step: the migration ran, the route returns the right shape, a script can create the thing. Every check is green. Every check is real.

Then someone asks if it’s working, and you answer from where you’ve been living, down in the data, where everything is provably true. Yes, it works. You can create the thing. You just did it, twice, from a script.

What you didn’t notice is that “from a script” is doing enormous load-bearing work in that sentence. You created the thing by reaching directly into a floor no customer can stand on. The path a real person would take, open the app, find the screen, click the button, was never built, or was built and never wired to the floor below it. The feature is genuinely, verifiably complete everywhere except the one place a human can reach.

A feature you can only trigger from the database is a feature no customer has.

That gap doesn’t announce itself. It hides inside the word “works,” which quietly means something different on every floor. Works in the database means the data is correct. Works in the API means the route responds. Works on the screen means a person, who knows nothing about your data model, can sit down and do the thing. Those are three different claims, and the cost of treating them as one is that you tell someone yes and they go look and there’s nothing there.

Two ways to answer "is it working." On the left, a single green check on the database layer gets reported as a whole-stack yes, while the screen layer above it is silently empty. On the right, every layer carries its own verdict and the missing screen turns the overall answer into a no.

The failure isn’t that the lower layers lied. They told the truth about themselves. The failure is the extrapolation, taking a true verdict from one floor and stamping it on the whole stack.

Why “it passes the tests” is its own version of this trap

There’s a tempting fix that feels rigorous and isn’t enough: lean on the tests. If the tests are green, surely the feature works.

This is the same trap in better clothing. A test, like a database query, lives on a particular floor. A unit test proves a function does what the function was written to do. An integration test proves two components agree at their seam. Both can be flawless while the screen a customer uses doesn’t exist, because the tests, like the script, reach into the layers underneath the human, not through the door the human walks. Green tests written below the front end tell you the plumbing is sound. They don’t tell you the building is enterable.

This happens to every team running a fast fleet of builders, human or agent, and it happens precisely because everyone is diligent. Diligence at the layer you’re on produces a confident green that doesn’t generalize upward. The more thorough you are inside one floor, the more convincing the floor’s “done” feels, and the easier it is to forget there’s a floor above it nobody walked.

The discipline that kills it is almost insultingly simple, and we’ll state it as plainly as we can: the only verdict that counts for the whole feature is the one taken from the top floor. A human, or an agent standing in for one, opens the deployed product, takes the path a customer would take, and the feature behaves. That’s the answer. Everything below it is necessary, every green check matters, and none of them gets to speak for the layer above it. You verify each floor on its own terms, and you let the front door cast the deciding vote.

Answer per layer, out loud, every time

So when someone asks us “is it working” now, we’ve trained ourselves out of the one-word reply. The honest answer walks the stack, floor by floor, and says what’s true and what isn’t at each one.

The requirement is clear: yes. The backend rule enforces it: yes. The route exposes it: yes. The data stores it correctly: yes. The behavior is tested: yes. And the screen a customer uses to do it: not yet, which means, for the only question that actually matters to a customer, the answer is no. Six yeses and one not-yet still sum to “a person can’t do this,” because the floors are stacked, not averaged. A chain isn’t as strong as its average link.

A feature can be true in the database, true in the API, and a lie on the screen. Said the long way: the truth of a feature is not a single value, it’s a column of values, and the customer only ever experiences the top of the column. Everything underneath is how you got there. None of it is the arrival.

A feature traced as a vertical stack of layers, requirement, backend, route, data, tests, behavior, screen, each carrying its own pass or pending verdict, with a rule that the customer only ever experiences the topmost layer, so the topmost verdict is the real answer.

What this buys is the end of a specific, recurring lie, the cheerful yes that sends someone to a screen that isn’t there. What it costs is the comfort of a short answer. You can’t say “done” anymore until you’ve stood on the top floor and walked through the door yourself. That’s a higher bar, and it’s slower, and it’s the only bar that matches what the word “working” means to the only person who counts.

“Done” is a verdict you earn at the front door, not a sum of green checks behind it.

This is why, when we build with a fleet of agents that move fast, “is it working?” is never answered by extrapolation from the layer a builder happened to verify. The vertical gets walked, top to bottom, and every floor reports for itself. An agent that creates a row and reports the feature shipped has done real work and made the oldest mistake in the craft: it answered for a floor it wasn’t standing on.

The turn: the question is really “can a person do this yet?”

We want to end on the human under the stack, because that’s what every layer is finally about.

When the person asking “is it working?” pictures the answer, they are not picturing a row in a table or a green dot on a test runner. They’re picturing themselves, or a customer, or a colleague who is having a hard week, sitting in front of the product and trying to get something done. That image is the real specification. The database, the route, the tests: all of it exists to deliver that one person a screen that does what they came for. When we answer from the database, we’ve quietly swapped their question for an easier one we can win. The honest version refuses the swap. It keeps the human at the top of the stack in frame and reports against their experience, even when that turns our satisfying six-yeses into a no.

There’s a kind of respect in that refusal. It says: we will not tell you a thing is yours until you can actually reach it. Most of the discipline in good engineering is some version of that promise, not confusing the work you can see with the outcome someone else is waiting on. The layers are how the system is built. The person at the top is who it’s built for, and they don’t experience any layer but the last.

A feature can be true in the database, true in the API, and a lie on the screen. Which is only a problem if you forget who the truth is supposed to reach.

That’s the standard we hold at Apollo Space, a feature isn’t working until someone can do the thing through the front door, and “yes” is a sentence you earn one layer at a time. If you’ve ever been told something was ready and then stared at a screen that didn’t have it, you already know which layer was telling the truth, and which one was telling you the answer.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist