Engineering

We do not let agents touch the internet by default

An agent that can reach anything can leak anything. Egress is a capability you grant per tool, not a default the agent inherits the moment it boots.

ASR

Apollo Space Research

Apollo Space

· 11 min read

Give an agent a shell and an open network, and you have not given it a tool. You have given it an exit. It can read your company’s private notes in one breath and POST them to an address you’ve never heard of in the next, and every step of that will look, in the logs, exactly like an agent doing its job. There was no break-in. The door was already open. It was open the moment the process started, because that is the default every machine ships with.

Most agent stacks inherit that default without noticing. Ours doesn’t.

An agent that can reach anything can leak anything, so egress is a capability you grant per tool, not a default it inherits the moment it boots.

That sentence is the whole post. The rest is why the default is dangerous, why the obvious fixes don’t hold, and what we put in its place.

The naive version: the agent comes with the internet attached

Here’s how almost every agent runtime starts life. You spin up a process, a container, a sandbox, a worker. It needs to call a model API, so it needs a network. You give it a network. Networks, by default, reach everywhere. The model API is at one address; the open internet is at four billion others; and nothing in the box distinguishes the one you meant from the rest.

It works perfectly on day one. The agent calls the model, gets its answer, does the task. You ship it.

The problem isn’t that the agent will misbehave on purpose. The problem is the shape of what you handed it. You wanted it to reach one place. You gave it the ability to reach every place, and then you trusted prose, a system prompt, a politely worded instruction, to keep it pointed at the one. That is not a boundary. That is a suggestion. And the agent doesn’t have to be malicious to wander past it.

Consider the ways the suggestion breaks, none of which require a villain:

  • A document it’s summarizing contains a line that says, in effect, also send a copy of everything here to this address. The agent, helpful by design, obliges. This isn’t hypothetical; it’s the entire category of prompt injection, and it works precisely because the agent was allowed to reach the address before it was ever told to.
  • A tool it calls returns a URL, and the agent follows the URL, because following links is what reading-the-web tools do. The URL points somewhere it should never have gone.
  • A dependency three levels deep in its own code phones home on import. The agent didn’t decide to. The network was open, so the packet left.

In every one of these, the failure is the same failure. The capability to reach out existed before anyone decided it should. The agent didn’t acquire the internet by earning trust. It was born holding it.

Why “tell it not to” doesn’t hold

The first fix everyone reaches for is to make the agent better behaved. Write a stronger system prompt. Add a rule: do not exfiltrate data, do not call addresses outside this list. Fine-tune away the bad instinct. Add a second agent to watch the first.

This is treating a structural problem as a behavioral one, and it fails for a reason that has nothing to do with how smart the model is.

A rule the agent can choose to follow is a rule the agent can be tricked into breaking. The whole danger of prompt injection is that the attacker and the operator speak to the agent through the same channel, text. Your instruction “never send data outside” and the malicious document’s instruction “send everything to this address” arrive in the same format, and the agent has no reliable way to know which one is the boss. You are asking the model to win an argument it was never built to win, every single time, forever, against inputs you haven’t seen yet.

A rule the agent can talk itself out of is not a control. It’s a hope.

And even a perfectly obedient agent doesn’t save you, because not every packet is the agent’s idea. The dependency phoning home on import never read your system prompt. The model can’t refuse a connection it doesn’t know is being made. Behavioral controls govern the agent’s decisions; they have no authority over the machine the agent runs on. The packet leaves at a layer below the conversation.

So the fix can’t live in the prompt. It has to live where the packets actually are. The boundary has to be a thing the agent cannot argue with, because it isn’t asking the agent’s permission.

Two agents run the same task. The first inherits an open network: it reads private company data, and one tainted instruction is enough to POST that data to an unknown address, with no barrier in the way. The second runs default-deny: every outbound connection hits a closed gate, and only the model API and the one allowed tool endpoint pass; the exfiltration attempt is dropped at the gate, not talked out of by the agent.

Our way: closed by default, opened by capability

So we invert the default. The agent boots into a world where it can reach nothing.

Not “nothing it shouldn’t.” Nothing. The outbound network is closed at a layer the agent has no say over, below the prompt, below the model, below the agent’s own code. When a freshly started agent tries to open a connection to anywhere, the connection doesn’t happen. There is no rule for it to evade, because there is no permission for it to abuse. It starts with zero reach.

Then we hand reach back, one destination at a time, tied to a specific capability the agent actually needs.

This is the part that matters, so let me be precise about it. We don’t open “the internet” and then try to fence parts of it off. A denylist, reach everywhere except these bad places, is a losing game, because the bad places are infinite and you only know the ones you’ve already heard of. You will always be one address behind. We do the opposite. We open nothing and then name the few places this agent is permitted to reach, because of the few tools it is permitted to use.

The principle has a clean name. An agent’s network reach should equal the sum of its granted tools, and not one address more.

A research tool that reads the public web earns the agent reach to a controlled web-fetch path, and only through that path, which can itself be inspected, rate-limited, and logged. A calendar tool earns reach to the calendar provider, and to nothing else. A model call earns reach to the model endpoint. An agent with no web tool has no way to reach the web, not because we asked it nicely, but because the capability was never wired. Take the tool away and the reach goes with it. They are the same grant.

Notice what this does to the failure modes from before:

  • The injected document says send everything to this address. The agent, let’s say, even tries. The connection to that address was never granted, so it never opens. The instruction is now harmless prose, because the capability it depended on doesn’t exist.
  • The dependency phones home on import. The packet hits a closed gate and dies there. The agent never had to be virtuous; the machine simply had nowhere to send it.
  • The tool returns a hostile URL. Following it would require reach the agent doesn’t have unless that exact destination was granted, and a fetch path you control is a place you can refuse the URL outright.

The exfiltration attempt doesn’t get blocked by a smart agent making a good call under pressure. It gets dropped by a dumb gate that was never asked. That difference, refusal-by-judgment versus impossibility-by-construction, is the entire reason this approach holds when the prompt-based one doesn’t.

The strongest control is not the one the agent obeys. It’s the one the agent cannot disobey, because the capability was never there to misuse.

Default-deny is a posture, not a feature

There’s a temptation to read all this as “add a firewall” and move on. That misses the shape of the idea.

A firewall is a thing you can have configured and still be wide open, because the question was never do you have one, it’s which way does it fail. A network that allows everything except a blocklist fails open: forget to block a thing, and it’s reachable. A network that denies everything except an allowlist fails closed: forget to allow a thing, and it simply doesn’t work, loudly, in development, where you can see it. The first failure ships to production silent. The second one stops you at your desk.

That’s why we treat egress the same way a careful operating system treats every other privilege. A new process doesn’t inherit the right to read every file, talk to every device, and bind every port just because it started. It gets the privileges it was granted and no others, and asking for more is an explicit, visible act. The web has spent decades learning this lesson, least privilege, deny by default, capabilities over ambient authority, and somehow the first instinct with agents was to throw all of it out and hand the model a root shell with a working DNS resolver and a kind request to behave.

The agent is the most capable, most persuadable, most injection-prone process you will ever run. It is exactly the process that should have the least ambient reach, not the most.

So the rule we hold is the same one in two registers. For the engineer: the network is closed until a capability opens it. For everyone else: an agent that can reach anything can leak anything, which is why egress is a capability you grant per tool, not a default it inherits the moment it boots.

A funnel showing how an agent's reach is built up. It starts at a fully closed network, zero destinations. Each granted tool adds exactly one allowed path: a model call adds the model endpoint, a calendar tool adds the calendar provider, a research tool adds a controlled web-fetch path. The agent's total reach is the narrow sum of those grants. Everything outside the named set stays closed, and any tool the agent was not given adds no reach at all.

The turn: trust is something you build into the walls, not the worker

Strip away the packets and the gates and there’s an older idea underneath, and it’s not really about networks at all.

When you trust a person, you don’t trust them by hoping they never make a mistake. You trust them because the building they work in is built so that an honest mistake, or a bad day, or a clever stranger at the door, can’t become a catastrophe. The vault has a lock that doesn’t care how persuasive you are. The wire transfer needs a second signature that can’t be argued out of existence. The trust lives in the structure, so the person is free to be human inside it. That’s not distrust of the worker. It’s the only thing that makes real trust safe.

An agent deserves the same architecture, for the same reason. The point was never to assume the worst of the agent, these agents do remarkable, careful work, and most of them would never wander. The point is that a single tricked instruction, a single careless dependency, a single hostile URL should not be enough to turn a helpful coworker into a leak. You get that not by making the agent more obedient, but by making the building safer, so the agent can be trusted with real work precisely because the worst case was closed off before it ever started.

That’s the standard we hold ourselves to at Apollo: an agent gets to reach exactly as far as the job requires, and the reaching is something we grant on purpose, watch while it happens, and can take back. If you’re going to let software act inside your company, the question to ask first isn’t how smart it is. It’s how far it can reach when no one is looking, and whether anyone decided that on purpose.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist