Engineering

When one agent should become many

Fanning a task out to a swarm of sub-agents is not free, it buys parallelism and pays for it in coordination. The skill is knowing which side of that trade a task is on.

ASR

Apollo Space Research

Apollo Space

· 10 min read

Give a task to one agent and it does the task. Give the same task to six agents and you now have a seventh problem: keeping the six from stepping on each other, contradicting each other, and confidently reporting six different versions of “done.” The work got faster. The accounting got harder.

That second cost is the one everyone forgets to put on the invoice. It’s also the one that decides whether splitting the work was a good idea or a bad one.

Fan-out is not free, it buys parallelism and pays for it in coordination, and the skill is knowing which side of that trade a task is on. This post is about the trade: when one agent should become many, how to make the many trustworthy, and the harder discipline almost nobody writes down, when to leave the one alone.

The naive view: more agents, more done

The obvious model is the one the demos sell. A task arrives. It’s big. So you chop it into pieces, hand each piece to its own agent, run them all at once, and staple the results back together at the end. Six agents, one-sixth the wall-clock. What’s not to love.

It works beautifully on the tasks that were already independent. Summarize forty documents, yes, fan that out, the documents don’t know about each other. Search ten sources in parallel, yes, the sources don’t collide. For genuinely parallel work, the swarm is exactly right, and a single agent grinding through the list one item at a time is just slow on purpose.

Then you try it on a task whose pieces do know about each other, and the model breaks in a way that’s easy to miss until it bites.

Say you split “build this feature” into a piece that writes the function and a piece that writes the code that calls it. Run them in parallel and they’re each guessing at the other’s shape. One assumes the function returns a list; the other assumes a single value. Both finish. Both pass their own tests. Stapled together, they don’t fit, and now you’re spending the time you saved on parallelism untangling a seam that a single agent, holding both ends at once, would never have created.

The naive model treats coordination as zero. It is never zero. The moment two pieces of a task have to agree on anything, a shape, an order, a fact, a file, you’ve added a cost that grows with how many agents have to stay in sync. Sometimes that cost is small and the parallelism wins. Sometimes it eats the parallelism whole.

Fan-out, then re-grade, never trust the swarm’s own report

Suppose you’ve decided a task really is parallel enough to split. You decompose it, you dispatch the workers, they run. Now comes the step the naive version skips entirely: you have to decide whether each piece actually came back done.

Here’s the trap. Each sub-agent reports its own result. Each one says, in its own words, looks done. And the agent that did the work is the worst possible judge of whether the work is finished, it wrote the test, so the test asserts what the code already does; it defined the goal, so the goal is whatever it happened to reach. Collect six self-graded “done”s and you don’t have six finished pieces. You have six claims, each signed by the one party with a reason to sign it.

So the orchestrator’s real job isn’t dispatching. Dispatching is the easy half. The real job is re-grading, pulling each returned piece and checking it against what the piece was actually supposed to do, with fresh eyes that have no pride in the result.

A claim is not a result.

That single rule changes the shape of the whole system. The orchestrator stops being a thing that collects answers and becomes a thing that interrogates them. It re-runs the flow the worker says it satisfied. It looks for the missing case rather than the present one. When a worker reports complete, the orchestrator treats that word as a hypothesis to disprove, not a fact to forward up the chain. One piece that doesn’t survive the re-grade goes back, no matter how confident the report, no matter how green the worker’s own tests were.

A task fans out from an orchestrator into parallel sub-agents that each return a self-graded result; the orchestrator re-grades every claim against the real requirement, sends failures back, and merges only the pieces that survive an independent check.

The pattern, then, is three beats, not one. Decompose the task into pieces that can run apart. Parallelize the pieces across isolated workers so they can’t corrupt each other’s working space. Re-grade every result independently before any of it counts. Skip the third beat and you’ve built a faster way to ship work nobody checked, which is worse than one careful agent, not better.

The discipline the demos skip: when NOT to fan out

Now the part that’s harder to sell and more important to learn. Most tasks should not be fanned out, and the reason is the cost we keep refusing to count.

Picture the work as a graph. Each piece is a node; each “these two must agree” is an edge. A task with no edges, forty unrelated documents, is pure parallel gold; add agents and you add speed with nothing to pay. But edges are where coordination lives, and they multiply faster than nodes. Three pieces that all depend on each other isn’t three relationships, it’s the tangle of every pair plus the whole. Fan that out and your agents spend more effort reconciling their guesses about each other than they’d have spent just doing the work in sequence.

The naive instinct is to ask “is this task big?” Big tasks feel like they want more hands. But size is the wrong question. A big task that’s mostly one long dependent chain, do this, then with that result do the next, then the next, has no parallelism to extract. Splitting it doesn’t get you six agents working; it gets you one agent working and five waiting, plus the overhead of passing the baton between strangers at every handoff.

The right question isn’t “how big is this?” It’s “how much of this can actually happen at the same time without the pieces needing to agree?” That’s the only thing the parallel version buys you, and on a tightly-coupled task the answer is almost none. There, the single agent wins outright, not because it’s heroic, but because it holds the whole task in one head and never has to negotiate with a copy of itself.

The tell is coordination overhead exceeding the parallel gain. When the agents would spend more time syncing than working, you’ve crossed the line. And the failure of crossing it isn’t slowness, slowness you’d notice. It’s a subtle wrongness: six locally-correct pieces that are globally incoherent, each defensible on its own, none of them fitting. A bug no single worker is responsible for, because it lives in the seams between them.

Fan-out buys parallelism and pays for it in coordination. On a task with no seams, you pay nothing and pocket the speed. On a task that’s all seams, you pay everything and pocket a coordination problem. The skill is reading the seams before you split.

A two-lane comparison: a loosely-coupled task splits cleanly into parallel agents with little to reconcile and finishes fast, while a tightly-coupled task split the same way produces locally-correct pieces that do not fit, where one agent holding the whole task would have been both simpler and safer.

The orchestrator’s actual job: routing, not multiplying

Put the two halves together and a quieter truth shows up. The hard part of running many agents was never running many agents. It’s deciding, per task, whether to.

A weak orchestrator has one move: split everything. It treats fan-out as the goal, so every task gets shattered into pieces whether it wanted to be or not, and the coordination tax gets paid on tasks that had no parallelism to pay it from. It looks busy. It is mostly busy reconciling problems it created.

A good orchestrator has two moves, and the judgment to pick. It reads a task for its seams. Loosely coupled, genuinely parallel, fan it out, then re-grade every piece. Tightly coupled, one long dependent chain, hand it to a single agent and get out of the way. The decision to not parallelize is as much a part of the job as the decision to parallelize, and a system that can only do one of those two is only half an orchestrator.

This is why “how many agents” is the wrong frame to optimize. More agents is not a goal. The goal is the work, done and verified, for the least total cost, and total cost includes the coordination you’d be signing up for. Sometimes that math says six. Often it says one. The orchestrator earns its keep by being right about which.

The turn: coordination is a cost humans pay too

Step back from the agents and you’ll recognize this. You’ve lived it.

It’s the meeting that should have been one person’s afternoon. The project split across five people who now spend half their week in sync calls reconciling what they each assumed. The feature handed to a team when it wanted one focused engineer, and the seams between their pieces became the bug that shipped. Every company learns, usually the expensive way, that throwing more people at a task is not the same as getting it done faster, that past a point, the coordination eats the gain, and the thing you needed was one person who could hold the whole problem at once.

Agents make that lesson literal, and cheap to learn. You can watch the coordination tax in the traces. You can see the six locally-correct pieces that don’t fit. You can measure the moment the syncing cost more than the parallelism saved. The math that took human organizations decades to feel is, for a swarm of agents, just numbers you can read.

So the question Apollo’s orchestration is built around isn’t “how do we run a hundred agents at once.” It’s the older, harder one underneath: given this specific work, what is the right number of minds, and how do we trust what each of them claims? Sometimes the answer is many, fanned out and independently re-graded. Sometimes the answer is one, left alone to think. Knowing the difference is the whole skill, for software, and it turns out, for companies.


That’s what we’re building at Apollo Space: not a swarm for its own sake, but an orchestrator that reads a task before it splits it, re-grades everything it splits, and is unembarrassed to hand the tightly-wound problem to a single agent and walk away. The cheapest coordination is the coordination you were wise enough not to create.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist