Engineering

Put a deterministic gate in front of your smartest reviewer

The cheapest defect-catch is a dumb script that checks two merged branches still boot before any judgment.

ASR

Apollo Space Research

Apollo Space

June 1, 2026 · 10 min read

Two branches passed review. Each one, on its own, was correct, tested, read by a careful reviewer, merged clean. Then they landed in the same place and the application wouldn’t start. One had renamed a config key; the other still read the old name. Neither diff was wrong. The pair was. And the reviewer who would have caught it was busy reading a third branch, because the thing that should have stopped this never ran: a script too dumb to have an opinion, whose only job is to merge the two branches in a scratch copy and check the result still boots.

That script costs almost nothing to run. The bug it would have caught cost an afternoon.

The cheapest defect-catch is a dumb script that checks two merged branches still boot before any judgment.

This post is about why the dumbest check in your pipeline belongs in front of your smartest reviewer, human or agent, and why teams keep getting that order backwards.

The naive version: judgment first, integration whenever

The obvious pipeline puts the expensive brain first. A change arrives. A skilled reviewer reads it, these days that reviewer might be a careful agent, but the shape is the same. They check the logic, question the edge cases, approve. The change merges. Somewhere later, maybe on a nightly build, maybe when the next person pulls main, the system actually tries to run with everything combined.

This works beautifully when changes arrive one at a time. It falls apart the instant they arrive in parallel.

The failure has a specific shape, and it isn’t a bad reviewer. Each reviewer sees one branch against a main that doesn’t yet contain the other branch in flight. So each one approves a change that is individually correct against a base that is already stale. The conflict isn’t inside either diff, it lives in the space between them, in the renamed key, the moved import, the migration that assumes a column the other migration just dropped. No amount of reviewing one branch reveals it, because the evidence isn’t in that branch. It’s in the sum.

A change can be correct and still be wrong about what it’s being merged into.

So the smart reviewer signs off in good faith, the merge goes through, and the breakage surfaces downstream, in a build everyone has stopped watching, or worse, on someone’s machine an hour later when they pull and nothing starts. The judgment was sound. The thing judgment can’t see got through anyway. And it got through because the only check that would have caught it was scheduled to run after the decision instead of before it.

The dumb check that has to go first

Here is the inversion. Before any reviewer, person or agent, spends a second of attention on a change, run the cheapest possible test: take the change, combine it with everything else currently headed for main, and check that the result boots, installs, and passes a thirty-second smoke. No judgment. No taste. A pass/fail that a shell script can produce.

That check has no opinion about whether the code is good. It only answers one question: does the world still start with this in it?

Two pipeline orderings compared. On the left, a change goes straight to the smart reviewer, gets approved against a stale base, merges, and the integration break surfaces downstream where no one is watching. On the right, a deterministic boot-and-smoke gate runs first on the merged result, fails fast and cheap on the pair that does not boot, and only a green change reaches the reviewer.

The reason this belongs first is economic, and it’s the whole argument. Reviewer attention, human or model, is the most expensive, most interruptible resource in the pipeline. A deterministic boot check is the cheapest. When you put the cheap check first, every change that fails it never reaches the expensive one. You don’t ask your best reviewer to read a branch that can’t even start; you hand them only branches that already stand up. The dumb gate isn’t competing with the smart reviewer. It’s clearing the floor so the smart reviewer is never wasted on a problem a script could see.

The cheapest defect-catch is a dumb script that checks two merged branches still boot before any judgment. Notice what kind of defect it catches: not the subtle logic bug, that’s the reviewer’s job, but the integration break, the one that no single-branch review can ever see, because it only exists once the branches are combined.

Why “the reviewer is smart enough” is the trap

There’s a tempting objection, and it’s worth saying out loud because it’s the reason teams skip the gate. Our reviewer is good. A careful enough reviewer, or a capable enough agent, would catch the conflict.

It sounds right. It’s wrong in a way that’s easy to miss.

A reviewer reads what’s in front of them. What’s in front of them is one branch. To catch a two-branch integration break by judgment alone, the reviewer would have to hold every other in-flight branch in their head simultaneously, mentally merge them, and simulate the boot, for every change, every time, while also doing the actual reviewing. That’s not a smarter reviewer. That’s a reviewer doing a deterministic machine’s job by hand, badly, under load. The smarter the reviewer, the more expensive it is to waste them on it.

This is the part people get backwards. Making the reviewer better does not remove the need for the gate, it raises the cost of not having one. A more capable reviewer is more expensive to interrupt, more valuable to keep focused, more wasteful to point at a problem that has a deterministic answer. The better your judgment layer gets, the more it pays to never spend it on something a script settles for free.

The smarter your reviewer, the more it costs to spend them on a question a script can answer.

So the gate isn’t a crutch for weak review. It’s what lets strong review stay strong, undistracted, pointed only at the questions that actually need a mind. You don’t put the dumb check first because your reviewer is dumb. You put it first because your reviewer is smart, and smart is too expensive to waste.

What the gate must be, to be trusted

A gate is only worth putting first if its verdict is believed. The moment a green gate can hide a real break, people start re-checking by hand, and the gate is dead weight. So the design constraints are strict, and they’re all about being trustworthy rather than being clever.

It has to be deterministic. Same inputs, same verdict, every run. A gate that’s green on Tuesday and red on Wednesday for the same code teaches everyone to ignore it. Flaky is worse than absent, because absent at least doesn’t lie.

It has to test the merged result, not each branch alone. The entire point is the space between branches. A gate that checks branches in isolation reproduces the exact blind spot it was meant to close.

It has to be fast and dumb on purpose. The gate runs on every change, ahead of everything, so it has to cost almost nothing. The instant it grows opinions, style, architecture, “is this the right approach”, it slows down, it goes flaky, and it starts overlapping the reviewer’s real job. Keep it boring. Boot, install, smoke, a real end-to-end path or two. Pass or fail. Nothing it says requires a human to interpret.

A funnel: every parallel change enters at the top, the deterministic gate at the neck passes only what boots when merged, and below it the smart reviewer and the final taste check spend their attention only on changes that already stand up.

Get those three right and the gate earns the one thing that makes it useful: a green from it means something, so nobody re-does its work. The reviewer below it can trust that anything reaching them already boots, and spend their whole attention on the questions a boot check can’t answer, is this correct, is this clear, is this the right thing to build. The dumb gate and the smart reviewer aren’t redundant. They’re a division of labor, and the order is the design.

What this looks like when changes come from agents

We care about this acutely because at Apollo, a lot of those parallel changes come from agents, and agents make the integration problem sharper, not softer.

A single careful engineer opening one pull request a day rarely collides with themselves. A fleet of agents working in parallel, each in its own isolated copy of the repository, each individually correct, each green on its own tests, collides constantly, because that’s what parallelism is. Two agents touch the same config from two different tasks. Both are right. Both merge. The pair doesn’t boot. The smartest reviewer-agent in the world, reading one branch, cannot see it, for the same reason a human can’t: the conflict isn’t in the branch, it’s in the sum.

So we put the deterministic gate first, ahead of every reviewer in the chain, human or agent. Before a change reaches the agent whose job is to attack it, before it reaches the agent that guards the standard, before any judgment is spent at all, a script merges it with everything else in flight and checks that the result still starts. The expensive judgment, and agent judgment is expensive, in tokens and in latency, is never spent on a change that can’t clear the cheapest bar there is.

The cheapest defect-catch is a dumb script that checks two merged branches still boot before any judgment. With a fleet, that’s not an optimization. It’s the only thing standing between “parallel” and “constantly broken main.”

The turn: someone has to defend the boring check

It is strangely hard to defend a dumb check on a team of smart people.

Every instinct on a strong engineering team pushes the other way. The reviewer is sharp, the agents are capable, the temptation is always to lean on judgment, to say we’re good enough to catch that, and skip the boring gate that has no opinion and adds a step. The gate feels like distrust. It feels like the thing you need only if your people aren’t good. So the person who insists on running it first, who refuses to let a change reach review until a script confirms the merged world boots, can look like they don’t trust the team.

They’re doing the opposite. They’re protecting the team’s best resource, its attention, from being spent on a problem beneath it. The discipline isn’t “we don’t trust the reviewer.” It’s “the reviewer is too good to waste on something a script settles for free, so we’re going to make sure a script settles it first.” That’s not a tooling preference. It’s a respect for where judgment should and shouldn’t go, and someone on every team has to be willing to hold that line when the room is full of people who’d rather just trust themselves.

The check stays dumb. The reviewer stays sharp. And the order between them, cheap-and-certain before expensive-and-wise, is a choice a person makes, on purpose, against the grain of a smart team’s instincts.

That’s the discipline we build into Apollo Space: let the dumb, certain checks run first so the expensive, careful judgment is never spent on a problem beneath it. If your best reviewer keeps getting pulled off real questions to debug a main that two correct branches quietly broke, the thing missing isn’t a smarter reviewer, it’s the boring little gate nobody wanted to defend.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist