Letting an agent run code is a containment problem, not a trust problem
Stop asking whether you trust the agent. The right question is what happens when it runs the wrong thing, and the answer is a sandbox that assumes it will.
Apollo Space Research
Apollo Space
An agent writes a five-line Python script to parse a spreadsheet, runs it, and hands you a clean table. Useful. Now imagine the same agent, on a bad day, writes a script that reads every credential on the machine and posts them to an address it just made up. Same agent. Same permission to run code. The only thing standing between the two outcomes is what the code can reach when it runs.
That gap is the whole subject of this post. Most teams approach it by asking the wrong question.
The wrong question is: do I trust this agent enough to let it run code? You will never get a satisfying answer, because the agent is non-deterministic and the failure you fear is the one it hasn’t shown you yet. Letting an agent execute is a containment problem, not a trust problem. The job isn’t to be sure it behaves. The job is to make misbehavior cost nothing.
The naive way: run it where everything is
The obvious way to give an agent the power to run code is to give it a shell. It already lives on a server. The server has Python, Node, a network connection, and a set of environment variables. So you let the agent shell out, and you watch it work, and for an afternoon it’s magic.
Then you think about what that shell can touch.
It can touch every environment variable in the process, which on most servers means the database password, the third-party API keys, the cloud credentials. It can open a socket to anywhere on the internet, and anywhere on your private network too. It can read the filesystem the host can read, including other tenants’ data if you’re a multi-tenant product. It can spin in an infinite loop and pin a CPU until the box falls over. None of this requires the agent to be malicious. It requires the agent to be wrong once, or to be talked into being wrong by a prompt buried in a document it was asked to summarize.
The naive setup has a single point of failure and it’s the agent’s judgment. You’ve bet the company on a model not making a mistake. That’s not a bet you can win at scale; it’s a bet you lose eventually and quietly. The day it fails, there’s no boundary to catch it, because you built the system as if the boundary were the agent’s good sense.
A shell on the host is the agent equivalent of running a stranger’s program as root. We don’t do that with strangers. We shouldn’t do it with agents either, and for the same reason: the danger isn’t who they are, it’s what they can reach.
The shape of the fix: a place where running the wrong thing is boring
Here’s the reframe the rest of the post builds on. You stop trying to guarantee the code is safe, and you start guaranteeing that unsafe code can’t do anything interesting. You move the execution off the host and into a box built so that the worst case is a shrug.
Letting an agent execute is a containment problem, not a trust problem. So you build a container, a real one, and you make four promises about it, each of which removes one of the things the naive shell could reach.
The four promises are: isolation (the code runs in its own box, not yours), resource caps (the box can’t outgrow its allowance), no ambient secrets (the box starts with nothing worth stealing), and ephemeral by default (the box is destroyed the moment the task ends, so nothing survives to be exploited later). Take them one at a time, because each one closes a specific door the naive version left open.
Isolation: the code runs in its own box, not yours
The naive failure was that the agent’s code ran in your process, on your host. Whatever the host could see, the code could see. Isolation breaks that line.
The naive instinct here is a softer one that doesn’t actually work: run the code in the same process but “carefully”, validate the script first, scan it for dangerous calls, refuse anything that imports the wrong library. This is a denylist, and denylists lose. There are too many ways to read a file or open a socket; you will never enumerate them all, and the one you miss is the one that hurts. Worse, you’ve now coupled your safety to your ability to predict every dangerous pattern in advance, the same losing bet as trusting the agent, just dressed as engineering.
The real fix is a wall, not a filter. The code runs in a separate sandbox, its own container or microVM, with its own kernel view, its own process table, its own filesystem mount. It doesn’t read your environment because it isn’t in your environment. When it asks the operating system “what files exist,” the answer is the sandbox’s files, not the host’s. The boundary isn’t a rule the code agrees to follow. It’s a wall the code physically cannot see past.
The difference between a denylist and a wall is the difference between asking the burglar to please not touch the silver and not putting the silver in the room. One depends on the burglar. The other doesn’t.
Resource caps: the box can’t outgrow its allowance
Isolation stops the code from reaching out. It doesn’t stop the code from being expensive. An isolated container can still spin a loop that eats every core, allocate memory until the machine swaps itself to death, or fork until the process table is full. Isolation is about where the code can reach; caps are about how much it can consume.
The naive version is no caps at all, you assume the agent writes reasonable code, and most of the time it does. Then one task hits a pathological input, the loop never terminates, and a single runaway script starves every other task on the host. Now your sandbox protected your secrets and took down your service anyway. Contained the leak, lost the building.
So every box gets an allowance written before it starts: a ceiling on CPU, a ceiling on memory, a ceiling on how long it’s allowed to run, a ceiling on processes and open files. Hit the ceiling and the box is killed, not warned, killed. The cost of a runaway task is exactly one task, the one that ran away. The reason this matters more for agents than for ordinary code is that you didn’t write the code and can’t predict its shape. A human engineer’s infinite loop is a bug they’ll fix. An agent’s infinite loop is Tuesday. You plan for it by making it cheap.
Suppose a script is supposed to finish in a second and it’s still running after, say, thirty. The agent didn’t gain anything in those thirty seconds, it just found a way to be wrong slowly. The cap turns “wrong slowly forever” into “wrong, then gone.”
No ambient secrets: the box starts with nothing worth stealing
This is the promise teams skip, and it’s the one that turns a contained breach into a non-event.
Walk through what “ambient secrets” means. On a normal server, the credentials live in the environment, the database URL, the API tokens, the cloud keys, sitting in environment variables or a mounted file, available to any process that looks. They’re ambient: present everywhere, by default, for everything. Convenient, and exactly the thing you most don’t want a piece of agent-written code to be able to read.
The naive fix is to pass the agent only the secrets it needs for this task. Better, but the secrets are still in the box, sitting in memory, readable by the very code you didn’t trust enough to run on the host. If that code exfiltrates them, scoping bought you a smaller blast radius, not zero.
The real fix is that the box starts empty. No credentials live inside it. When the task genuinely needs to reach a real system, pull a row, call an API, it doesn’t read a key from its environment; it asks a broker outside the sandbox, which holds the real secrets, checks that this task is allowed this one action, and either does it on the task’s behalf or hands back a credential that’s scoped to one capability and expires in minutes. The secret never enters the box it can’t trust. And because the box is thrown away when the task ends, even a leaked credential is dead on arrival, it expired with the sandbox that briefly held it.
The principle underneath: a secret that isn’t there can’t be stolen. The most secure place for a credential is outside the room where untrusted code runs, handed in one narrow capability at a time, and gone the moment the work is done.
Ephemeral by default: the box is destroyed when the task ends
The last promise is the cheapest to state and the easiest to get wrong: when the task ends, the box dies. Completely. Filesystem, memory, process, gone.
The tempting optimization is to keep the box around. Spinning up a fresh sandbox costs a beat of latency, so why not reuse a warm one for the next task? Because reuse is how state leaks across tasks. One task writes a temp file with a customer’s data; the next task, for a different customer, finds it sitting there. One task leaves a process running; the next inherits it. A long-lived sandbox slowly accumulates exactly the ambient state you built the sandbox to avoid. You rebuilt the host, smaller.
Ephemeral means the contract is: every task gets a clean box, and the box outlives nothing. No file written in one task is visible to another. No process started in one survives into the next. Whatever the code did to its world dies with the world. This is what makes the other three promises hold over time, isolation, caps, and no-ambient-secrets are guarantees about a single run, and ephemerality is the guarantee that a single run is all you ever get. The box is a match, not a candle. It burns once, for one task, and then there’s nothing left to catch fire.
There’s a quiet bonus: a system where every run starts from the same clean state is a system you can reason about. The same code with the same input does the same thing, because there’s no leftover from last time to change the answer. Containment and reproducibility turn out to be the same wall, viewed from two sides.
The turn: this is what lets you say yes
Strip away the containers and the brokers and here’s what’s actually happening. The reason most teams won’t let an agent run code isn’t that they think it’s incapable. It’s that they can’t bound the downside, so the only safe answer is no, and a coworker you can never say yes to isn’t a coworker, it’s a demo.
The sandbox is what converts the no into a yes. Not because it makes the agent trustworthy, nothing makes a non-deterministic system trustworthy, and chasing that is a trap. It’s because the sandbox makes being wrong affordable. When the worst the code can do is burn its own disposable box, you stop needing to be certain it’s right. You can let it try. You can let it be wrong, and learn from the wrongness, and try again, at a speed that would be reckless if every mistake could reach your secrets or your other customers.
That’s the whole shift. Trust is a binary you can never honestly resolve. Containment is a property you build, measure, and rely on. We chose to build the property, isolation so the code runs in its own box, caps so it can’t outgrow its allowance, no ambient secrets so there’s nothing inside worth stealing, ephemerality so nothing survives to be exploited. Four promises, one boundary, and on the far side of it an agent that’s finally allowed to do real work.
This is what we’re building at Apollo: a place where an agent can execute, fail cheaply, and try again, because the question was never whether you trust it. The question was always what it can reach when it runs. Get the boundary right and the trust takes care of itself.
Apollo runs your company's repetitive ops so your team doesn't.
Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.
Join the waitlistThe hidden tax of parallel agents is a migration diamond
Six agents writing to one schema conflict in the database, not the code, and CI dies at "multiple heads."
EngineeringAn orchestrator that can't survive its own crash isn't one
A crash that erases the orchestrator's reasoning loses the one thing you can't rebuild.
EngineeringPut a deterministic gate in front of your smartest reviewer
The cheapest defect-catch is a dumb script that checks two merged branches still boot before any judgment.