Engineering

One agent is an engineering problem. Twelve is a coordination problem.

Running one AI agent is an engineering problem. Running twelve is a coordination problem. When your SDR agent and your competitor watch agent disagree about timing, who wins? Here's how we solved it.

ASR

Apollo Space Research

Apollo Space

· 15 min read

The Tuesday Morning When Three Agents Fought

On a Tuesday in January 2026, three Apollo Space agents tried to contact the same prospect within a 90-minute window.

The SDR agent had scheduled a follow-up email for 9:15 AM, part of a nurture sequence it had been running for two weeks. The deal intelligence agent had detected that the prospect’s company had just announced a new funding round and wanted to send a congratulatory note referencing the news. The competitor watch agent had flagged that the prospect’s current vendor had announced a price increase, and wanted to trigger a re-positioning email.

Three agents. Three emails. One prospect. Same morning.

Any one of those emails would have been appropriate. All three, sent within 90 minutes, would have been a disaster, the kind of spammy over-communication that turns a warm prospect into a blocked sender.

This is the orchestration problem. Not “how do you build agents” but “how do you prevent agents from trampling each other.” And it’s the problem that nobody talks about in the agent hype cycle, because the demos always show one agent doing one thing, and the complexity of multi-agent coordination is invisible until you actually run multiple agents in production.

Building one agent is an engineering problem. Orchestrating twelve is a systems design problem, a game theory problem, and sometimes a political science problem. The orchestration layer is the hardest piece of Apollo Space to build, the most fragile to maintain, and the most invisible to users. When it works, no one notices. When it fails, everything falls apart.

Why Agents Conflict

Agent conflicts aren’t bugs. They’re emergent properties of a system where multiple autonomous actors operate on shared state with partially overlapping objectives.

Apollo Space’s twelve agents conflict for three reasons:

1. Resource Contention

Multiple agents want to use the same resource at the same time. The most common resource conflict is attention, multiple agents want to contact the same person or post to the same channel. But resources also include compute (running expensive analyses), external APIs (rate limits), and human review bandwidth (only so many decisions can be escalated to humans before they’re overwhelmed).

The Tuesday morning incident was a resource conflict: three agents competing for the same prospect’s attention.

2. Recommendation Contradiction

Two agents look at the same situation and reach opposite conclusions. The SDR agent’s model says to push for a meeting this week. The deal intelligence agent says the prospect’s calendar is packed (based on their public availability signals) and recommends waiting until next week. Both agents are reasoning from valid data. They just weigh the factors differently.

This happens more often than you’d expect. In December 2025, we logged 23 recommendation contradictions across all agent pairs. The most common pair was SDR + deal intelligence (11 contradictions), followed by competitor watch + SDR (7 contradictions). The QA and code review agents almost never conflict because their domains are cleanly separated.

3. Side Effect Interference

An action by one agent changes the state that another agent depends on. The content agent publishes a blog post that references a competitor by name. The competitor watch agent, which monitors mentions of our company on competitor channels, now needs to account for the fact that any competitor response might be triggered by our content rather than being an organic signal.

Side effect interference is the subtlest form of conflict and the hardest to detect. It doesn’t manifest as two agents trying to do the same thing, it manifests as one agent’s environment being altered by another agent’s actions in ways that affect decision quality.

The Orchestration Layer

Apollo Space’s orchestration layer sits between the agents and the outside world. No agent acts directly, every action proposal passes through the orchestrator, which applies four mechanisms: priority resolution, conflict detection, consensus protocols, and human escalation.

Priority Resolution

Every action proposal has a priority score. The score is calculated from three components:

Base priority: Each agent has a static priority level that reflects its domain’s criticality. The observability agent has the highest base priority (system health is always urgent). The content agent has the lowest (content can almost always wait).

AgentBase Priority
Observability95
QA85
Budget Monitor80
Deal Intelligence75
SDR70
Competitor Watch70
Code Review65
Post-Sale Health65
Team Intelligence60
Meeting Digest55
Content50
Custom AgentsVariable

Urgency modifier: Time-sensitive actions get a boost. An email that needs to go out before a meeting in 2 hours gets +20 urgency. A weekly report that can wait until tomorrow gets +0. Urgency is calculated based on time-to-deadline and decay rate (how quickly the action’s value decreases with delay).

Impact modifier: High-impact actions get a boost. An action affecting a $200K deal gets a higher impact modifier than an action affecting a $10K deal. Impact is estimated from the action’s expected business value, which the agent reports as part of its action proposal.

The composite priority score: base_priority + urgency_modifier + impact_modifier. When two actions conflict, the higher-priority action wins.

For the Tuesday morning incident, the priority scores worked out to:

  • Competitor watch alert (positioning email): 70 + 15 (time-sensitive, competitor news fades) + 18 (large deal) = 103
  • Deal intelligence (congratulatory note): 75 + 10 (funding news is current but not urgent) + 12 (relationship maintenance) = 97
  • SDR follow-up (nurture sequence): 70 + 5 (scheduled, not urgent) + 8 (standard pipeline) = 83

The competitor watch email won. The deal intelligence email was rescheduled for Wednesday. The SDR follow-up was pushed to Thursday. One email per prospect per day, the orchestrator enforces this as a hard constraint.

Conflict Detection

Before any action executes, the orchestrator checks for conflicts with pending and recently executed actions. The conflict detection system maintains a state graph that tracks:

  • Entity states: Which prospects have been contacted recently, which channels have been posted to, which PRs are under review
  • Pending actions: All action proposals that have been submitted but not yet executed
  • Cool-down windows: Minimum time between interactions with the same entity (e.g., 24 hours between emails to the same prospect)

Conflict detection runs in constant time using a hash-based lookup on entity identifiers. When a new action proposal arrives, the orchestrator checks:

  1. Is there a pending action targeting the same entity? (Resource contention)
  2. Is there a recently completed action that triggers a cool-down? (Cool-down violation)
  3. Is there a pending action that contradicts this proposal? (Recommendation contradiction)

If any check fails, the conflict resolution process kicks in.

The conflict detection system has caught an average of 31 conflicts per week since deployment. The majority (68%) are cool-down violations, agents trying to contact the same person too frequently. About 22% are resource contentions, multiple agents wanting the same time slot or communication channel. The remaining 10% are recommendation contradictions.

Consensus Protocols

For recommendation contradictions, cases where two agents disagree about what to do, the orchestrator runs a lightweight consensus protocol.

The protocol depends on the stakes:

Low stakes (impact modifier < 10): The higher-priority agent wins. No discussion, no deliberation. For low-stakes decisions, speed is more valuable than consensus.

Medium stakes (impact modifier 10-25): The orchestrator presents both agents’ reasoning to a tiebreaker, typically the Director agent responsible for that domain (Growth Director for sales-related conflicts, Ops Director for engineering-related conflicts). The Director makes a decision based on the combined reasoning within one decision cycle.

High stakes (impact modifier > 25): Human escalation. Both agents’ reasoning, the Director’s recommendation, and the relevant context are packaged into an escalation and routed to the appropriate human. The human decides.

In practice, about 70% of contradictions are low-stakes and auto-resolved by priority. About 25% are medium-stakes and resolved by a Director agent. About 5% are high-stakes and escalated to humans. The goal is to keep the human escalation rate low enough that humans aren’t overwhelmed but high enough that genuinely important decisions get human judgment.

The consensus protocol adds latency, about 2-4 seconds for a Director resolution, and minutes to hours for a human escalation. For most agent actions, this latency is acceptable. For real-time agents (observability, QA), the protocol has a fast path: if the action is within the agent’s autonomous scope (as defined by the trust architecture), it bypasses consensus entirely.

Human Escalation

Human escalation is the safety valve. When the orchestrator can’t resolve a conflict through priority or consensus, or when an action exceeds the trust architecture’s autonomous scope, it escalates to a human.

Escalations are structured, not free-form. A human receives:

  1. The decision: What the agent wants to do
  2. The reasoning: Why the agent wants to do it (extracted from the decision loop)
  3. The conflict: If relevant, what the contradicting agent proposed and why
  4. The recommendation: The orchestrator’s suggested resolution
  5. The deadline: How long until the action’s value degrades (urgency context)

The human can approve, modify, or reject. Their decision is logged and fed back into the agents’ episodic memory so similar situations are handled better in the future.

We target a human escalation rate of 3-5% of all agent actions. Below 3%, and we’re probably not escalating things that should get human eyes. Above 5%, and humans are spending too much time reviewing agent decisions, which defeats the purpose of having agents.

Our current escalation rate: 4.1%, which is right in the target range. Of escalated decisions, humans agree with the orchestrator’s recommendation 73% of the time. The 27% where humans disagree is the most valuable data we have, it reveals where the orchestration logic has gaps.

Fallback Patterns

Agents fail. Models go down. APIs rate-limit. Memory retrieval stalls. The orchestration layer’s job isn’t to prevent failure, it’s to ensure graceful degradation when failure happens.

Apollo Space implements three fallback patterns:

1. Capability Degradation

When an agent fails, the system loses a capability but continues operating. If the competitor watch agent goes down, the SDR agent still sends emails, it just doesn’t have fresh competitive intelligence. If the meeting digest agent fails, meetings still happen, people just don’t get automated summaries.

This sounds obvious, but getting it right requires careful dependency mapping. Some agents depend on each other. The SDR agent’s outreach quality is better when it has competitor intelligence. The deal intelligence agent’s scoring is more accurate when it has meeting digest summaries. These are soft dependencies, they improve performance but aren’t required for operation.

Hard dependencies are different. If the observability agent can’t reach the metrics API, it can’t do its job at all, there’s no degraded mode. For hard dependencies, the fallback is alerting a human that the capability is fully offline.

We map every inter-agent dependency as either soft (degrade gracefully) or hard (fail loudly). The dependency map looks like this:

SDR Agent soft-depends on: Deal Intelligence, Competitor Watch, Content Agent QA Agent hard-depends on: GitHub API, Test Infrastructure Competitor Watch soft-depends on: Web Scraping APIs (can cache stale data for 48 hours) Observability hard-depends on: Metrics API, Log API

When a soft dependency fails, the dependent agent continues with reduced quality. When a hard dependency fails, the dependent agent enters a suspended state and the orchestrator routes its responsibilities to the fallback handler.

2. Stale Data Tolerance

When an agent can’t get fresh data, how long can it operate on stale data? This varies by agent and by data type.

The competitor watch agent can tolerate stale pricing data for about 48 hours, pricing doesn’t change hourly. But it can only tolerate stale job posting data for about 7 days before the signal becomes unreliable.

The SDR agent can tolerate stale CRM data for about 4 hours. Beyond that, there’s a meaningful risk that the prospect’s status has changed (they might have replied to a separate thread, or another team member might have contacted them).

We encode stale data tolerance as a per-data-type configuration for each agent. When data age exceeds the tolerance threshold, the agent doesn’t crash, it reduces its confidence scores to reflect the uncertainty from stale data, and the trust architecture may route more decisions to human review as a result.

3. Queue and Retry

When an action fails due to a transient issue (API timeout, rate limit, temporary service outage), the orchestrator queues the action for retry with exponential backoff. The retry policy is:

  • First retry: 30 seconds
  • Second retry: 2 minutes
  • Third retry: 10 minutes
  • After three retries: the action is logged as failed and the agent is notified to replan

The retry queue handles about 340 transient failures per week across all agents. Most (89%) succeed on the first retry. About 7% succeed on the second. The remaining 4% either succeed on the third or fail permanently.

Permanent failures get escalated to the agent that proposed the action, which replans, either choosing an alternative action or escalating to a human. The orchestrator doesn’t make decisions about what to do when things fail. It routes the failure information to the entity that can make that decision.

The Hardest Bugs

Orchestration bugs are the hardest bugs we debug. They’re emergent, intermittent, and often non-reproducible.

Three examples:

The Priority Inversion: In December 2025, the SDR agent stopped sending follow-up emails for a week. No errors. No alerts. The root cause: the observability agent had been generating a burst of low-priority health check notifications that were filling the orchestrator’s action queue. The SDR agent’s actions were being queued behind hundreds of observability notifications. The priority system should have resolved this, but a bug in the urgency modifier calculation was assigning zero urgency to scheduled follow-ups (because their deadline was “whenever the cool-down expires,” which the system interpreted as “no deadline”). The fix was two lines: scheduled actions now get a minimum urgency score of 5.

The Phantom Conflict: In January 2026, the meeting digest agent started failing to post summaries for about 20% of meetings. The cause: a conflict detection false positive. The content agent had started publishing blog posts to the same Slack channel where meeting digests were posted. The conflict detector’s entity matching was too broad, it treated “any post to #general” as a potential conflict with “any other post to #general” and was applying a cool-down window. The fix was narrowing the entity matching to distinguish between post types, not just post destinations.

The Consensus Deadlock: In February 2026, a high-stakes decision sat in the consensus queue for 6 hours without resolution. The SDR agent proposed sending a pricing proposal to a large prospect. The deal intelligence agent recommended waiting because the prospect’s CFO was on vacation (detected from out-of-office auto-replies). The Growth Director agent couldn’t resolve the tie because both arguments were valid, the pricing window was closing, but sending a proposal when the decision-maker was away was wasteful. The Director’s confidence was 51/49, which was below our resolution threshold of 60%. It should have escalated to a human, but a bug in the escalation logic was checking the winning confidence (51%) against the escalation threshold instead of the margin (2%). The fix was straightforward, but the detection took hours because the system looked like it was “thinking”, no errors, no timeouts, just a pending decision.

These bugs share a common property: they don’t manifest as traditional software failures. The system is running. The agents are healthy. The metrics are green. But the coordination is broken in a way that only shows up as a business outcome, emails not sent, summaries missing, decisions delayed.

This is why orchestration observability is separate from agent observability. You need to monitor the coordination layer itself: queue depths, conflict rates, consensus latency, escalation rates, and, most importantly, the percentage of agent actions that are being delayed, modified, or blocked by the orchestrator. If that percentage deviates from the baseline, something is wrong in the coordination layer, even if every individual agent is healthy.

The Design Principle

Twelve agents operating independently are chaos. Twelve agents operating under a centralized controller are brittle. The right design is in between: agents that are autonomous in their domain but coordinated at the boundaries.

Each Apollo Space agent owns its domain completely. The SDR agent decides who to contact, what to say, and when to say it. The QA agent decides what to test, how to test it, and whether to flag an issue. No other agent or system tells them how to do their job.

The orchestrator doesn’t manage agents. It manages interactions between agents. It doesn’t decide what the SDR agent should write, it decides whether the SDR agent can send that email right now, given what every other agent is doing. It’s a traffic controller, not a manager.

This principle, autonomous domains, coordinated boundaries, is what makes the system scalable. Adding a thirteenth agent doesn’t require rewriting the orchestration logic. It requires defining the new agent’s domain, its priority level, its interaction patterns with existing agents, and its fallback behavior. The orchestration framework handles the rest.

We’ve added three agents since the initial nine, and each addition took 2-3 days of orchestration configuration versus 2-3 weeks of agent development. The orchestration layer is hard to build but easy to extend, which is exactly the trade-off you want in a system that needs to grow.

The orchestration layer is invisible. Users never see it. They see twelve agents working in harmony, each doing its job at the right time. That harmony isn’t natural, it’s engineered, line by line, conflict by conflict, fallback by fallback. And it’s the hardest, most important engineering we do.

Get engineering deep-dives on agent systems, subscribe to the Apollo Space blog

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist