Engineering

Trust is a ladder, not a switch

A new agent earns autonomy the way a new hire does, one verified task at a time.

ASR

Apollo Space Research

Apollo Space

· 10 min read

No sane company hands a new hire the keys to the wire-transfer system on day one. They give them a read-only login, a small first task, and a manager who checks the output. Pass that, and the leash gets a little longer. Pass enough of them, and one day nobody is checking anymore, because the record already says they can be trusted. We onboard people on a ramp. Then we turn around and deploy AI agents with a single toggle: off, or fully autonomous, nothing in between.

That toggle is the reason most teams are stuck. They can’t flip it on, because a brand-new agent with full authority is terrifying. So they leave it off, and the agent stays a clever demo that never touches anything that matters.

A new agent earns autonomy the way a new hire does, one verified task at a time.

This post is about the ramp we built instead of the toggle, and why the most important number in an autonomous system isn’t how smart the model is. It’s how much it’s allowed to do without you watching.

The naive version: one switch, two bad answers

The obvious way to ship an agent is to decide, up front, how much you trust it. You set a permission level once, at configuration time, and the agent operates at that level forever.

This sounds reasonable until you try to pick the level.

Set it low, and the agent is useless. Every action waits for a human to approve it, so the thing you bought to save attention now spends your attention all day. You read every draft, confirm every step, click yes on operations the agent has gotten right a hundred times. The agent isn’t autonomous. It’s a very expensive autocomplete with a confirmation dialog.

Set it high, and the agent is dangerous. The first time it confidently does the wrong thing, sends the wrong message to the wrong list, edits the wrong record, runs the irreversible operation on real data, you don’t trust it anywhere again. One bad action at full authority doesn’t cost you one task. It costs you the whole relationship. You revoke everything and go back to doing it yourself, because now you’ve felt what unsupervised wrong looks like.

So the single switch forces a choice between useless and dangerous, and there’s no setting in the middle that’s both safe and worth having. The problem isn’t the agent. The problem is that trust was modeled as a state, on or off, when trust is not a state. Trust is a history. It’s the accumulated record of times this specific actor did this specific kind of thing and it turned out fine.

A human team never had a trust switch. It had a ramp.

The ramp: scope earned per verified task

We stopped asking “how much do you trust this agent?” and started asking a different question, the one a good manager actually asks: “what has this agent proven it can do?”

The answer is not one number. It’s a record, task type by task type, of what the agent has done and how it went.

Two ways to grant an agent authority. On the left, a single switch forces a choice between low permission, which makes the agent useless, and high permission, which makes one wrong action catastrophic. On the right, a ladder where the agent starts read-only, proposes its first writes for approval, earns auto-execute on the task types it has gotten right, and only the rare high-stakes action still asks first.

A new agent starts at the bottom rung: read-only. It can look at everything and change nothing. It drafts, it proposes, it explains what it would do, and a human approves each action before it happens. This is the same boring start every new hire gets, and it’s boring on purpose. The agent is building a record before it’s trusted with consequences.

Each time the agent proposes an action and a human approves it and the outcome is good, that counts. Not in a vague way, for that specific task type. Drafting a summary is one kind of trust. Updating a customer record is another. Triggering something that costs money or can’t be undone is a third, and they don’t transfer. An agent that has earned the right to update records on its own has earned nothing about moving money. The ramp is per-skill, the way a person who’s great at the books still doesn’t get to sign contracts.

Cross enough good outcomes on a task type, and that task type graduates. The agent stops asking permission for it and just does it, logging what it did. The human who used to approve every instance now reviews a feed of completed actions instead of a queue of pending ones, and only when something looks off. The leash got longer exactly where the record earned it, and stayed short everywhere it hadn’t.

Autonomy isn’t a setting you grant. It’s a balance the agent earns, one verified task at a time.

Why per-task, and not one global trust score

There’s a tidier-looking version of this that quietly fails, and it’s worth staging, because it’s the one most people build first.

The tidy version is a single trust score. The agent does well, the number goes up; it does badly, the number goes down; above some threshold, it’s allowed to act on its own. One dial, easy to reason about, easy to show on a dashboard.

It fails the first time the agent is excellent at the easy thing and the score lets it loose on the hard thing.

An agent can write a hundred flawless summaries and earn a sky-high global score, and that score says nothing about whether it should be allowed to issue a refund. Summaries and refunds share a model, but they don’t share a risk. A global score launders competence at low-stakes work into authority over high-stakes work, which is exactly the mistake you’d never make with a person. The new analyst who’s brilliant at research still doesn’t get signing authority, no matter how good the research is, because good at one thing was never evidence of safe at another.

So we keep the ledger split. Trust is tracked per task type, and authority on one says nothing about authority on another. The agent that has earned auto-execute on data lookups is still on read-only for anything that spends money, and it stays there until it has a record on that, separately. The dashboard is messier. The system is honest about what’s actually been proven.

There’s a second thing the per-task ledger buys you, and it’s the one enterprise buyers care about most. When the agent does act on its own, there’s a clean answer to why was it allowed to. Not “the trust score was high.” Instead: this exact action type graduated after this many verified outcomes, here is the record, here is who approved the ones that built it. Autonomy you can’t explain is autonomy you can’t sign off on. The ledger is the explanation.

A single trust ledger split by task type. Drafting a summary has many verified outcomes and runs on auto-execute. Updating a record has a few and runs on auto-execute for that type only. Moving money has none yet and still routes every time to a human for approval, competence at one task never leaks into authority over another.

What demotion looks like, because trust runs both ways

A ladder you can only climb isn’t a trust model. It’s a countdown to the day the agent does something it earned the right to do and gets it wrong anyway.

Real trust runs both directions. A new hire who’s been cleared to handle something on their own, and then botches it badly, goes back to having their work checked for a while. Nobody fires them. Nobody treats one mistake as proof they’re hopeless. The leash just gets shorter on that one thing until the record is rebuilt. That’s not punishment. It’s how a healthy team stays both fast and safe.

The ramp works the same way. A task type that graduated to auto-execute can be demoted back to propose-and-approve when an outcome goes bad, automatically, on the signal, not after a postmortem three weeks later. The agent that earned its way up a rung can lose that rung on the specific skill where it slipped, and keep every other rung it earned. Demotion is narrow on purpose: one bad refund pulls back authority over refunds, not over the summaries the agent is still flawless at.

This is the part that makes the whole thing safe enough to actually turn on. The reason teams leave the autonomy switch off is the fear of the irreversible, unsupervised mistake, the action that’s both wrong and uncatchable. A ramp with real demotion changes the risk math. The worst an over-trusted task type can do is one bad outcome before it’s pulled back to asking permission, and the highest-stakes actions never graduated in the first place. You’re not betting the company on the agent never being wrong. You’re betting that when it’s wrong, the blast radius is one task type and the system catches it.

The point isn’t an agent that never fails. It’s a system where a failure costs one rung, not the whole relationship.

The turn: the thing the ledger is really measuring

Look closely at what’s actually accumulating on that ramp, and it isn’t the agent’s intelligence. The model was as capable on day one as it is on day ninety. Nothing about its raw ability changed while it climbed.

What changed is yours. The ledger isn’t a record of the agent getting smarter. It’s a record of you getting to relax, proof, action by action, that this particular kind of work no longer needs your eyes on it. Every graduated task type is a thing that used to live in your head as a worry and now doesn’t. That’s the real product of the ramp: not a more powerful agent, but a steadily shorter list of things you have to personally hold.

We didn’t invent that arc. It’s the oldest thing in the working world. You earn the trust of the people around you the slow way, one kept promise at a time, and the reward is that they stop double-checking you, and you stop double-checking each other, and the whole team gets faster because the trust is in the record instead of in everyone’s anxiety. A new agent earns autonomy the way a new hire does, and the day you stop watching a task type is the day it finally starts saving you the thing you actually wanted back, which was never the keystrokes. It was the worrying.

The smartest model in the world deployed behind a single switch is still useless or still dangerous. The thing that makes an agent worth handing real work to isn’t a better mind. It’s a record you can point at and a leash that gets longer exactly as fast as that record earns it.


That’s what we’re building at Apollo Space: agents that start the day they’re hired with read-only access and a manager’s patience, and earn their way to autonomy the way the best people on your team did. If you’ve ever been the one who finally stopped checking someone’s work, and felt the day get lighter the moment you did, that’s the feeling we think software should give you next.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist