Sometimes the right number of agents is one
Under a fixed budget with clean context, a single agent beats a swarm, fan out only when domain isolation is a hard requirement.
Apollo Space Research
Apollo Space
Give one task a budget and two ways to spend it. Spend it all on a single agent that reads the whole problem, holds it in one head, and works it end to end. Or split the same budget across five agents, each handed a slice, none of them seeing the others’ work until a sixth agent staples the pieces together at the end. The second setup looks more serious. It has a diagram. It has a swarm.
It also, more often than people expect, loses.
Under a fixed budget with clean context, a single agent beats a swarm, fan out only when domain isolation is a hard requirement.
This post is about when to add agents and when adding them makes the result worse, and why the instinct to fan out is, most of the time, an instinct to spend your budget on coordination instead of thinking.
The naive version: more agents must mean more capability
The obvious move, the one every multi-agent demo trains you toward, is to treat agents like workers. You have a big job, so you hire a crew. Split the job into five parts, give each part to its own agent, run them at the same time, and collect the results. More hands, more done. It feels like the whole reason parallelism exists.
It demos beautifully. Then you run it on a real task with a real budget and the math turns on you.
Here is the failure, and it is not that any single agent is bad. Each one does its slice fine. The failure is that you spent your fixed budget five ways, so each agent got one-fifth of the room to think, and then you spent more of the budget on the part no single agent had to do before: stitching. Agent three made an assumption about the data shape. Agent one made the opposite assumption. Neither saw the other, because the whole point of splitting was that they didn’t. Now the joining agent isn’t combining finished work; it’s a referee discovering, at the end, that the pieces don’t fit. The contradiction surfaces last, when it is most expensive to fix, instead of first, when one mind would have caught it in passing.
A single agent on the same budget never has that problem, because there is no seam. It read the data shape once and every later decision inherited it. The assumption that broke the swarm was never a question for the solo agent, it was just context it already held. The swarm didn’t fail because the agents were weak. It failed because splitting the work split the understanding, and understanding doesn’t survive being cut into fifths.
So the first rule we build to is almost rude in its simplicity: don’t fan out to look serious. Fan out when staying together is the thing that’s actually impossible.
It’s worth dwelling on why the contradiction surfaces last instead of first, because that timing is the whole cost. When one mind works a problem, every decision it makes becomes a constraint on the next one. It can’t quietly contradict itself, because the earlier choice is still sitting in its context, vetoing the incompatible later one. The check is continuous and free. Split the work, and you remove that running check. Each agent is internally consistent and globally blind. The contradictions don’t go away, they go silent, accumulating quietly until the joining step rips the lid off and finds five locally-correct answers that don’t add up to one correct whole. You didn’t avoid the cost of consistency by parallelizing. You deferred it to the most expensive possible moment and made it someone else’s emergency.
Coordination is a tax, and you pay it whether or not you needed to
There’s a cost to running agents in parallel that the demos never show, because in a demo it rounds to zero. On a real task it doesn’t.
Every time two agents have to agree on something, that agreement costs budget. Someone has to specify the slice for agent four precisely enough that it doesn’t wander. Someone has to read five outputs and notice the two that conflict. Someone has to re-run agent two because it solved a slightly different problem than the one it was handed. None of that effort moves the task forward. It exists only because you split the task in the first place. It is overhead you created and then have to pay down.
The cruel part is that the tax scales the wrong way. Two agents have one seam between them. Three agents have three. Five agents have ten possible pairs that might disagree, and a joining step that has to reconcile all of them. Add agents to go faster and past a point you’ve added more coordinating than working, the curve where headcount stops helping is the same curve a manager learns the hard way when a team of three ships more than a team of nine.
A swarm doesn’t fail loudly. It just quietly spends your budget on agreeing with itself.
Under a fixed budget with clean context, a single agent beats a swarm. Not because parallelism is bad, because parallelism is expensive, and most tasks can’t afford to buy something they don’t need.
When clean context is the whole game
The phrase doing the real work in our rule is “clean context,” and it’s worth being concrete about what it means, because it’s the hinge the whole decision turns on.
Clean context means one agent can hold everything the task needs without the holding itself becoming the bottleneck. The problem fits. The relevant facts fit. The constraints fit. When that’s true, splitting the task is pure loss, you take a problem that fit in one head and pay to scatter it across several that now have to reassemble it. Every demo where the solo agent quietly outperforms the swarm is a demo where the context was clean and nobody needed to admit it.
Context stops being clean for exactly one reason: the task genuinely contains pieces that must not share a head. Not “would be tidier apart.” Must not. A reviewer that has to be skeptical of code cannot be the same mind that wrote the code and is proud of it, that’s not a budget question, it’s a structural one, and we’ve written before about why the hardest reviewer on the team is the one with no stake in the work. A retrieval agent that must read a hostile document cannot share a context window with the agent holding your secrets, because the moment they share a window, the hostile text can reach the secret. Those are real seams. They’re worth paying for.
The mistake is treating every boundary like one of those. Most boundaries in a task aren’t safety walls or skepticism walls. They’re just lines someone drew on a whiteboard to make the work look parallelizable. Drawing the line doesn’t make the line load-bearing. And a line that isn’t load-bearing is just a place you’ve agreed to pay the coordination tax for nothing.
So the test we apply before fanning out is a single question, and it has to clear a high bar to pass: is there a reason these pieces cannot live in the same mind, a correctness reason, a security reason, a too-big-to-fit reason, or do they merely look separable? If it’s the second one, the right number of agents is one.
The “too-big-to-fit” case deserves its own honesty, because it’s the one people reach for to justify a swarm they wanted anyway. Yes, some tasks genuinely overflow a single context, a codebase too large to read at once, a corpus too wide to summarize in one pass. But the moment you split for size, you’ve taken on the coordination tax on purpose, and the right move is to pay it as cheaply as possible: split along the natural boundaries that already exist in the problem, not the convenient ones. A repository splits cleanly by module because the modules were designed not to know about each other. A research corpus splits cleanly by source. Splitting a single tightly-coupled function across three agents because it’s “long” splits something that was never meant to come apart, and you’ll pay for that at the seam, every time. Size can force a fan-out. It doesn’t excuse a careless one.
The architecture: one mind by default, seams only where they’re real
Here is the shape we actually build to, and it is deliberately boring at the center.
By default, a task is one agent with the full budget and the whole problem. It plans, it works, it carries every decision forward without ever re-explaining itself to a colleague, because it has no colleague. That’s not us under-using the swarm. That’s us refusing to pay a tax the task didn’t levy.
We add a second agent only when a seam is real, and we add it across that seam and nowhere else. The adversarial reviewer is a separate agent because skepticism can’t share a head with authorship, one seam, justified, paid. The sandboxed agent that reads untrusted input is separate because the untrusted text must never touch the privileged context, one seam, justified, paid. We don’t split the writing of a feature across three agents because the feature has three files. Files aren’t seams. The understanding that connects them is the asset, and we don’t cut the asset to make a chart look busier.
This is the same discipline a good engineering org applies to teams, just sped up. You don’t split a project across five people because it has five parts; you split it where the parts genuinely can’t be held by one person, and you keep everything else together so it stays coherent. Every extra boundary you add is a place information has to be re-transmitted, and every re-transmission is a place it gets garbled. The org that ships keeps its boundaries scarce and load-bearing. So do we.
The payoff is that when we do fan out, the fan-out means something. A second agent on a task is a signal that a real wall exists, a place where two minds is not a luxury but a requirement. Nobody on the team has to wonder whether the swarm is theater, because we don’t run swarms as theater. The agent count is a readout of how many genuine seams the task has, and most tasks have one.
The turn: the judgment is the part that doesn’t scale
You can buy more agents. You cannot buy the judgment that decides whether to.
That decision, fan out here, stay together there, is the entire skill, and it’s the one thing more compute will never hand you for free. A team that can spin up a hundred agents and points all hundred at every task hasn’t built capability; it’s built a very expensive way to spend a budget on coordination. The capability is in the restraint: the willingness to look at a problem that could be split, recognize that splitting it would only scatter a context that was already clean, and assign it to one agent who will simply solve it. That restraint reads, from the outside, like doing less. It is doing exactly enough.
The systems will keep getting cheaper to parallelize. The temptation to throw more agents at everything will only grow, because more agents will keep looking like more seriousness. None of that changes the rule underneath. Under a fixed budget with clean context, a single agent beats a swarm, and the engineer who knows when that’s true, and has the discipline to act on it instead of reaching for the impressive-looking crowd, is worth more than the crowd.
That’s the discipline we’re building at Apollo Space: an operating system that adds an agent because a seam demands it, never because a swarm looks impressive. If you’ve ever watched a team of nine ship less than a team of three, you already understand the rule, and you already know that knowing when not to fan out is the hardest, most valuable call in the room.
Apollo runs your company's repetitive ops so your team doesn't.
Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.
Join the waitlistThe hidden tax of parallel agents is a migration diamond
Six agents writing to one schema conflict in the database, not the code, and CI dies at "multiple heads."
EngineeringAn orchestrator that can't survive its own crash isn't one
A crash that erases the orchestrator's reasoning loses the one thing you can't rebuild.
EngineeringPut a deterministic gate in front of your smartest reviewer
The cheapest defect-catch is a dumb script that checks two merged branches still boot before any judgment.