Product Thinking

Your status page should have written itself an hour ago

Incident comms are written last, by the most stressed person in the room, when they should be the first thing a system drafts the moment the signals diverge.

ASR

Apollo Space Research

Apollo Space

February 21, 2026 · 10 min read

The graph turned red at 9:14. The status page updated at 10:02. In the forty-eight minutes between, three hundred customers refreshed a green page that was lying to them, two of them tweeted, one of them opened a support ticket that said “is it just me?”, and the one engineer who could fix the outage spent eleven of those minutes not fixing it, because someone had to write the update, and they were the someone.

That gap is not a tooling gap. The monitoring worked. The alert fired. The page has an edit button. The gap is that the message a customer most needs to hear was the last thing anyone got around to, written by the person least able to spare the attention, at the exact moment attention was the scarcest resource in the building.

This post is about closing that gap from the wrong end.

The message is written last, by the worst-positioned person

Here is the order incidents actually happen in, at almost every company. The signal diverges. A human notices. The human starts debugging. Somewhere in there, a second human, or the same exhausted first human, remembers that customers exist, and that the right thing to do is tell them something. So they stop, open a doc or a status-page editor, and stare at a blank box while the incident keeps burning behind them.

Incident comms are written last, by the most stressed person, when they should be the first thing a system drafts the moment the signals diverge.

Read that order again, because it’s exactly backwards. The communication has the loosest dependency on the fix, you can tell a customer “we see it, we’re on it” before you have the slightest idea what “it” is. And yet it gets scheduled behind the fix, behind the triage, behind the war-room scramble, in the one slot where the person writing it has the least bandwidth and the highest stakes. The artifact that needs the calmest author gets the most rattled one.

The rest of this post is about flipping that order: making the draft the first thing that exists, not the last.

Naive: a status page is a box a human remembers to fill

The naive incident-comms story is the one almost everyone is living in. You have a status page. It’s a content management system with a public URL. When something breaks, a human is supposed to log in, pick a severity, write a sentence, and publish. Later, they’re supposed to come back and post updates, and at the end, post the all-clear.

It works in the demo. It fails in the incident, for three reasons that all rhyme.

It fails because remembering is a human job at the worst human moment. During an outage, the page is one more thing competing for a brain that’s already overcommitted to the fix. “Update the status page” is a task with no owner the instant the on-call person’s hands are full, and during a real incident, they always are.

It fails because the blank box is slow. Even when someone remembers, they now have to compose. What’s the severity? What do we say without overpromising? Which systems are affected, exactly? That’s three decisions and a writing task, serialized, while the clock runs and customers refresh.

And it fails because the page only knows what a human types into it. The monitoring system already knew, at 9:14, that latency had tripled on the checkout path. The status page didn’t, because nothing connected the thing that detected the problem to the thing that announces it. Two systems, one wall between them, and the wall is staffed by a person under maximum stress.

The page isn’t broken. It’s just downstream of a human who is, at that moment, the wrong tool for the job.

On the naive timeline, the signal diverges and the on-call engineer must notice, triage, and only then stop to remember the status page, stare at a blank box, and compose an update, so customers stare at a stale green page for the whole gap. The page only knows what the exhausted human types into it.

The reframe: the divergence is the trigger, not the human

The fix isn’t a faster human or a better blank box. The fix is to notice that the trigger already exists, and it isn’t a person remembering.

The trigger is the divergence. The moment the signal departs from normal, latency tripling, error rate climbing, a queue backing up, a payment provider timing out, that is the event. A human noticing the divergence is a second, slower, unreliable event layered on top of the real one. We’ve been wiring our comms to the slow event when the fast one was sitting right there.

So invert it. The moment the signals diverge, a system drafts the message. Not publishes, drafts. It reads what the monitoring already knows, writes the customer-facing sentence a human would have written, picks the likely severity, lists the affected surfaces by name, and sets it in front of a human as a near-finished thing with one button: publish, or fix this first.

The blank box becomes a full box. The remembering becomes automatic. And the eleven minutes the engineer spent composing become eleven minutes spent fixing, because the composing already happened, the instant the graph turned, by a system that doesn’t get rattled.

This is the same move Apollo makes everywhere: the system speaks first, and the human’s job shifts from produce the thing to approve the thing. A draft you reject in four seconds cost you four seconds. A blank box you have to fill during an outage costs you the outage.

Why a draft, not an auto-publish

There’s a tempting overcorrection here, and it’s worth killing before it spreads. If the system can write the update, why not let it publish the update? Skip the human entirely. Divergence in, status page out, fully automatic.

Because incident comms is exactly where you don’t want full automation, and the reason is the false positive. Monitoring fires on things that aren’t incidents, a deploy blip, a noisy health check, a regional hiccup that self-heals in ninety seconds. An auto-publishing system turns every one of those into a public “we’re having problems” that scares customers about an outage that never was. You’d train your users to ignore the page, which is worse than not having one.

The draft is the right unit because it splits the work along the line where machines and humans are each strong. The machine is good at the part that’s slow and mechanical under stress: noticing instantly, reading the signals, composing the sentence, assembling the affected-systems list. The human is good at the part that’s fast and high-judgment: is this real, and do we want to say it out loud right now? One glance, one decision, one button.

The machine writes the draft. The human owns the publish.

That division is the whole design. It’s not “should we let the AI run the incident”, nobody serious wants that. It’s “should the human start from a blank box or a near-final draft, during the one moment they have the least time to write.” Stated that way, it isn’t a hard call.

And the draft keeps getting better at being a draft. The same system that watches the signals also watches which drafts the human published as-is and which they rewrote, so the next incident’s draft sounds more like your team and less like a robot, without anyone tuning a template.

Two timelines from the same red graph. On the left, the human is the trigger: notice, remember, compose, publish, minutes of stale green. On the right, the divergence is the trigger: a system drafts the customer message the instant the signal departs from normal and hands a near-final update to a human, whose only job is one glance and one button, publish, or fix first.

The same gap, everywhere a human is the messenger

Once you see the shape, you see it isn’t really about status pages. The status page is the most visible instance of a pattern that runs through every company: a message that should travel at the speed of the event travels at the speed of a tired human remembering to send it.

The internal version is the war-room ping that goes out twenty minutes late, so half the company debugs a problem the other half already knows about. The stakeholder version is the executive who hears about the outage from a customer instead of from their own team, because the update that should have reached them was queued behind the firefight. The follow-up version is the postmortem that’s “still being written” three weeks later, when the only people who remember the timeline have moved on to the next fire.

Every one of those is the same defect. The communication is treated as a thing a person produces after the real work, when it could be a thing a system drafts from the real work, the moment the work begins. Suppose a typical incident eats, say, thirty minutes of an engineer’s attention on comms, the war-room update, the status page, the stakeholder note, the timeline for the eventual writeup. That’s thirty minutes of your sharpest person not fixing the thing they’re uniquely able to fix. Not because the writing was hard. Because nothing started it for them.

The cost isn’t the typing. The cost is when the typing happens, at the peak of the incident, by the person on the critical path, instead of at the trough of the event, by a system that was never on the critical path at all.

The turn: the calmest voice in the room shouldn’t be a person

Here’s the part that isn’t about incidents.

The reason this gap hurts is that we’ve quietly decided the most stressed person in the room should also be the company’s voice in its worst moment. We hand the writing of the customer-facing message, the one that decides whether trust survives the outage, to the human whose hands are deepest in the failure and whose adrenaline is highest. We call that ownership. It’s actually a setup for the worst message at the worst time.

The promise here isn’t “AI runs your incidents.” Nobody capable wants that, and the draft-not-publish line is there precisely to refuse it. The promise is narrower and more humane: the person who can fix the thing gets to spend the incident fixing the thing, because the message that needed writing was already written, calmly, by something that started the instant the graph turned red, and waited for a human to say yes, send it.

A company that communicates at the speed of its events instead of the speed of its most overloaded human isn’t faster software. It’s a company where being under pressure no longer means being silent, and where the customer hears “we see it” while the engineer is still reading the trace, not forty-eight minutes after.

That’s what we’re building at Apollo Space, a company operating system where the proactive draft is the default, so the message that matters most is the first thing that exists, not the last thing someone remembers. If you’ve ever published a status update with shaking hands while the real fix waited, you already know the update should have written itself an hour ago. It was waiting on a trigger you’d already wired. It just wasn’t allowed to be the one that mattered.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist