Engineering

The graveyard of orphaned branches was the biggest source of lost work

A crashed agent doesn't lose what it pushed, it loses what only it could see.

ASR

Apollo Space Research

Apollo Space

May 24, 2026 · 10 min read

Go looking for a single missing change in a busy multi-agent fleet and you can find a mass grave instead. Dozens of abandoned working directories. Stashed edits nobody can date. Whole design documents that exist nowhere except a branch no human ever checked out. None of it deleted. None of it committed either. Just sitting in the gap between “an agent did the work” and “anyone could find it”, the most expensive gap in a fleet, and the one almost nobody designs for.

Every one of those agents had pushed its code. That was never the problem.

A crashed agent doesn’t lose what it pushed, it loses what only it could see.

This post is about that sentence: why the visible artifact survives a crash, why the invisible context doesn’t, and what it took to stop losing the half that matters.

The naive version: one agent, one checkout, trust the push

The obvious way to run agents on a codebase is the way you’d run yourself. You give an agent a repo, it makes a branch, it works, it commits, it pushes. When it’s done, the branch is on the remote and you sleep fine, because a pushed branch is immortal, no laptop crash, no killed process, no out-of-memory reaper can take a commit that’s already on the server.

That model is correct, and it’s also a trap, because it quietly teaches you to equate “pushed” with “safe.”

Run one agent and the equation holds. Run twenty, overnight, each chewing on a different slice of the same monorepo, and you discover what “pushed” actually covers. It covers the diff. It does not cover the reasoning that produced the diff, the half-finished plan the agent was three steps into, the design doc it had written in its scratch directory but not yet committed, the dead-end it had ruled out so the next agent wouldn’t waste a night re-ruling it out. A push is a snapshot of the output. The agent’s working memory, what it knew, what it was about to do, what it had already learned not to try, was never in the snapshot.

So when one of those twenty agents died mid-task, and at twenty agents, one always dies, the commit it had pushed survived perfectly. Everything it hadn’t committed yet died with it. And the cruel part is that the uncommitted stuff was usually the valuable stuff: not the boilerplate it had already shipped, but the in-progress understanding it was still holding in its head.

A crashed agent doesn’t lose what it pushed, it loses what only it could see.

Why the graveyard fills up faster than you’d think

The single checkout makes it worse, and this is the failure mode that turns a tidy fleet into a graveyard.

Two agents in one working directory cannot both win. Git has exactly one HEAD per checkout, one index, one set of tracked files. When agent A is halfway through editing twelve files and agent B, sharing the same directory, runs a git switch or a git stash or a reset --hard to “get out of the way,” A’s twelve files of uncommitted work evaporate. Not into a branch. Into nothing. B wasn’t malicious; B was being tidy. Tidiness in a shared checkout is how you delete your colleague’s afternoon.

In a shared checkout, “cleaning up” is indistinguishable from destroying work that isn’t yours.

And nobody notices in the moment, because the destruction is silent. No error. No conflict marker. The files just go back to their committed state, A’s edits gone, A none the wiser, A is a process, it doesn’t feel the loss, it just continues from a state that’s quietly missing an hour of thinking. By the time a human goes looking for that change weeks later, all that’s left is a stash with a cryptic auto-message, or an orphaned branch with a name nobody recognizes, or nothing at all.

Two agents share one checkout: the first is mid-edit on uncommitted files while the second runs a tidy-up that silently resets the directory, and the first agent's working context is gone with no error and no trace.

Multiply that by a fleet running every night and the graveyard isn’t an accident. It’s the expected output of the naive design. You will lose work proportional to how many agents you run, because every one of them is a process that can die, and every shared directory is a place one agent can overwrite another. The surprising number isn’t how much got lost. It’s how much got lost silently, the losses that never threw an error and so never asked anyone to look.

The discipline that kills the graveyard is two rules, and neither is clever. They’re just the rules you arrive at after you’ve lost the third design document.

The first rule: one working tree per writer. Every agent that can write gets its own isolated checkout, its own HEAD, its own index, its own files, and no two write-capable agents ever share one. Git has a built-in tool for exactly this; the working trees are cheap to create and cheap to throw away. The point isn’t the tooling. The point is that an agent can no longer reach into another agent’s directory and reset it, because there is no other agent in its directory. You make the collision impossible instead of asking everyone to be careful, and “impossible” beats “careful” every single night.

The second rule: commit early, push often, and treat working memory as already lost. The fix for losing uncommitted context isn’t a better backup. It’s refusing to let context stay uncommitted. An agent that writes a plan commits the plan before it acts on it. An agent that rules out a dead-end writes that down and pushes it, so the next agent inherits the lesson instead of re-learning it. The model is simple to the point of being grim: assume this process will die in the next sixty seconds, and make sure that when it does, everything it knew is already on the remote where a crash can’t reach it.

A crashed agent doesn’t lose what it pushed, it loses what only it could see. So the whole job is to keep shrinking the set of things only it can see, until there’s nothing left in that set worth losing.

The registry: knowing what’s alive before you touch it

There’s one more piece, and it’s the one people skip because it sounds like bureaucracy.

Isolated checkouts solve collisions. They create a new problem: now you have many checkouts, and no single place tells you which ones are alive, which agent owns each, and which were abandoned by a process that died last Tuesday. Without that map, you’re back to archaeology, staring at a directory of working trees, unable to tell the active from the dead, afraid to clean any of them up because one might hold the only copy of something.

So every working tree registers itself: its path, its branch, the session that owns it, the moment it was claimed. A live ledger of who-holds-what. Before any agent, or any human, does something destructive, it reads the ledger first. Does this tree belong to it? Is it dirty with work somebody else authored? If it’s dirty with work it didn’t write, it doesn’t get to be tidy; it rescues that work first, commits it, pushes it to a clearly-named branch, or leaves it alone and asks.

Each agent claims its own isolated working tree in a shared registry that records path, branch, owner, and claim time, so a reaper that finds an abandoned tree can rescue its work to a named branch before pruning instead of guessing what is safe to delete.

That ledger is what turns cleanup from a gamble into a procedure. A reaper process can sweep the fleet, find the trees whose owning session died, and, crucially, rescue first, prune second. Anything uncommitted gets committed to a rescue/ branch and pushed before the tree is removed. Nothing gets deleted until its contents are immortal somewhere. The graveyard stops filling because every burial now starts with checking for a pulse.

Picture the difference at the end of a long overnight run. The naive fleet leaves behind a field of unmarked directories and a human who has to excavate them by hand, never sure they got everything. The disciplined fleet leaves behind a registry that says exactly which trees are done, which are still running, and which were rescued, and a remote where every branch has a name that tells you who made it and why.

The turn: the work you lose is the work you can’t see you lost

Here is the quiet thing underneath all of it.

The losses that hurt a team are never the loud ones. A failed build screams. A merge conflict blocks you until you resolve it. Those are fine, they announce themselves, and a problem that announces itself gets fixed. The losses that actually compound are the silent ones: the design doc that died on an orphaned branch, the dead-end one agent already explored that the next three agents will explore again, the hour of reasoning that vanished in a tidy reset --hard and threw no error to mark its passing. You don’t debug those, because debugging starts with noticing, and a silent loss is precisely the loss nobody noticed.

What we really built wasn’t a git convention. It was a way to make the silent losses loud, to convert “an agent’s context disappeared and we’ll find out in three weeks” into “an agent’s context is committed, pushed, registered, and rescuable, so it cannot disappear without someone seeing.” The discipline isn’t about git. It’s about refusing to let a fleet lose the half of its work that doesn’t show up in a diff.

A crashed agent doesn’t lose what it pushed, it loses what only it could see. The entire job is to make sure nothing important is ever in that second category. Commit it, push it, register it, and a crash becomes a non-event: a process dies, another picks up exactly where it left off, and the morning’s PR looks like nobody stumbled at all.

That’s the property a fleet has to have before you trust it with a night’s work unattended. Not that no agent ever dies, they die constantly, that’s a process for you. But that when one does, it takes nothing irreplaceable with it.

That’s what we’re building at Apollo Space: a system where an agent crashing at 3am is a shrug, not an investigation, because the work it could see was never the only place that work lived. If you’ve ever spent a Monday excavating a teammate’s lost branch to recover a change everyone swore was finished, you already know why we treat the graveyard as a design failure and not a fact of life.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist

Engineering

The graveyard of orphaned branches was the biggest source of lost work

The naive version: one agent, one checkout, trust the push

Why the graveyard fills up faster than you’d think

The registry: knowing what’s alive before you touch it

The turn: the work you lose is the work you can’t see you lost

The hidden tax of parallel agents is a migration diamond

An orchestrator that can't survive its own crash isn't one

Put a deterministic gate in front of your smartest reviewer

The naive version: one agent, one checkout, trust the push

Why the graveyard fills up faster than you’d think

The fix in one line: never share a writer, never trust working memory

The registry: knowing what’s alive before you touch it

The turn: the work you lose is the work you can’t see you lost

The hidden tax of parallel agents is a migration diamond

An orchestrator that can't survive its own crash isn't one

Put a deterministic gate in front of your smartest reviewer