Can an AI agent really learn a specific brand voice?

Yes, but it takes months of iterative training. The key is providing not just positive examples but explicit anti-patterns, voice rules, and continuous feedback. Most teams give up too early because the first outputs are generic.

How many training samples does a content agent need?

We found diminishing returns after about 200 samples. But the quality of samples matters more than quantity, 50 well-annotated examples with explicit reasoning about what makes each piece on-brand were more valuable than 200 unannotated samples.

What metrics should you use to evaluate AI-generated content?

We found internal read rate, whether your own team voluntarily reads the content, to be the best leading indicator. If your team doesn't want to read it, your audience won't either. External metrics like engagement and conversion matter too, but they lag by weeks or months.

Use Cases

Teaching an AI agent your brand voice: lessons from 6 months of content

Teaching Apollo Space's content agent to write like a specific team takes months, and it starts out terrible. Internal read rates climb from the single digits into the 90s along the way. Here's what typically goes wrong, and what actually works.

ASR

Apollo Space Research

Apollo Space

August 29, 2025 · 14 min read

The Content No One Read

Training a content agent to write like a specific team almost always starts the same way. You turn the agent on, ask it to draft a blog post about agent-based operations, and twenty minutes later you get 1,800 words of technically accurate, strategically sound, utterly forgettable content.

It opens with something like: “In today’s rapidly evolving technological landscape, AI agents are transforming how businesses operate.”

Nobody finishes reading it. That’s exactly what happened with the first drafts from Apollo Space’s content agent.

It’s a content problem, and not the kind you solve by switching models or rewriting prompts. The agent can write. It can structure arguments, cite data, and produce grammatically flawless prose. What it can’t do is write like you.

The metric that captures this best is internal read rate: the percentage of the team that actually opens and reads the draft. Early on, it lives in the single digits. People skim the first paragraph and move on.

That read rate is the most important metric to track. Not because it measures content quality directly, but because it measures something more fundamental: does the content have a voice that makes you want to keep reading?

Early on, it doesn’t.

What Brand Voice Actually Is

Before explaining how to fix it, we need to define what brand voice is, because most people conflate voice with tone, and they’re different things.

Tone is situational. A support email is warm. A security advisory is urgent. A changelog is neutral. Tone changes based on context.

Voice is constant. Voice is the set of patterns that make someone recognizable regardless of what they’re writing about. It’s sentence length distribution. It’s the ratio of concrete examples to abstract claims. It’s whether you use “we” or “our team” or “the company.” It’s whether you start paragraphs with evidence or with opinions. It’s a hundred small choices that compound into something recognizable.

When people say “this doesn’t sound like us,” they’re talking about voice, not tone.

The voice a team develops organically through a year of writing blog posts, customer emails, and investor updates usually has specific characteristics that were never formalized. For Apollo Space, they were these:

Short declarative sentences mixed with longer analytical ones. Not consistently short (that’s Hemingway cosplay). Not consistently long (that’s academic). A rhythm that alternates.
Concrete before abstract. Always lead with a specific example, data point, or anecdote. Then extract the principle. Never the reverse.
Honest about failures. Don’t just share wins. Share what went wrong, what went sideways, and what you’d do differently. Not performatively, not “failure is just learning in disguise” LinkedIn-speak. Actually honest.
No hedging language. No “it’s possible that” or “it could be argued that” or “some might say.” Make claims and defend them.
No marketing superlatives. No “revolutionary,” “game-changing,” “cutting-edge,” or “state-of-the-art.” If the thing is good, the specifics should make that obvious.

None of this is usually written down. It lives in the team’s heads and in the corpus of things already published. And that’s exactly the problem.

The Template Trap

The first thing every team tries is the obvious thing: give the content agent examples. Twenty of the best blog posts, five investor updates, ten customer emails. The equivalent of telling a new hire “read these and write like this.”

It doesn’t work.

The agent reads the examples and produces content that’s stylistically averaged across all of them. It picks up some patterns, shorter paragraphs, some data citations, but it produces a flattened version of the voice that feels like a cover band playing your songs. Technically correct, emotionally empty.

The read rate climbs a little, but it’s still bad.

The problem is that examples alone don’t convey voice. They convey output. And you can produce similar output through very different voice choices. The agent ends up pattern-matching on surface features, paragraph length, vocabulary, structure, without understanding the underlying principles that generated those features.

It’s the same reason you can’t learn to paint by studying photographs of paintings. You need to understand the brush strokes, the color theory, the compositional choices. The output is a consequence of the process, and the process is what matters.

Writing the Rules Down

The next step feels ridiculous at the time: spending an entire day writing down the voice rules.

Not brand guidelines. Not a “voice and tone” deck with adjectives like “confident, approachable, authoritative.” Those are useless for AI training because they’re subjective and ambiguous. “Confident” to one person is “arrogant” to another.

What works is operational rules. Specific, testable, unambiguous. The ones Apollo Space wrote looked like this:

Rule 1: Every post opens with a specific story, example, or data point. Never an abstract statement.
Rule 2: Sentences alternate between under 15 words and over 25 words. No more than three consecutive sentences of the same length.
Rule 3: First-person plural (“we”) when describing team actions. First-person singular (“I”) only when the author is sharing a personal opinion.
Rule 4: Every claim must be followed within two paragraphs by a supporting example, data point, or anecdote.
Rule 5: No sentences starting with “In today’s,” “It’s no secret that,” “As we all know,” or any other filler opener.
Rule 6: No words from the banned list: revolutionary, game-changing, cutting-edge, state-of-the-art, leverage (as a verb), synergy, unlock (as a verb in business context), empower.
Rule 7: When discussing failures, include what specifically went wrong, what the impact was, and what changed. No vague “we learned a lot.”
Rule 8: Section headers are statements or questions, never vague topic labels. “What We Got Wrong” instead of “Challenges.” “The Data Says Otherwise” instead of “Data Analysis.”
Rule 9: End sections with a forward-looking statement or a question. Never end a section with a summary of what was just said.
Rule 10: No bulleted lists of more than five items. If you have more than five items, either prioritize or group them.

Around two dozen rules in all. Some stylistic, some structural, some about what not to do. All handed to the content agent alongside the example corpus.

The results get better immediately. The agent starts opening posts with anecdotes instead of platitudes. Sentence rhythm improves. The banned words disappear. The read rate jumps into the 40s.

But that still means more than half the team isn’t reading the content. And the feedback from those who do read it is consistent: “It’s better. It doesn’t feel generic anymore. But it still doesn’t feel like us.”

The Negative Example Breakthrough

The breakthrough usually comes from an accidental discovery.

One of the engineers, who had been ignoring the content drafts, reads one and replies with a single line: “This paragraph is the most Apollo Space thing I’ve ever read.” He quotes a paragraph about a deployment failure that was raw, specific, and slightly self-deprecating.

The next day, he flags another paragraph: “This is the most un-Apollo Space thing I’ve ever read.” It was a generic paragraph about “the power of AI to transform business processes.”

That gives you an idea. Instead of just giving the agent positive examples, you start giving it negative examples, specifically labeled as “do not write like this.” You go through the existing content corpus and the agent’s own drafts, annotating paragraphs as “on-brand” or “off-brand” with specific explanations of why.

The annotations look like this:

On-brand: “The agent sent 847 emails in the first month. 312 were opened. 23 got replies. 12 became meetings. A 1.4% conversion rate, respectable for cold outbound, but nothing we’d write home about.” Why: Specific data, concrete narrative, honest assessment, slight understatement.

Off-brand: “Our AI-powered outreach solution demonstrated significant improvements in key performance indicators across multiple engagement metrics.” Why: Vague, no data, marketing language, passive construction, “key performance indicators” is the exact kind of abstraction to avoid.

Annotating dozens of paragraph pairs, on-brand and off-brand versions of similar ideas, is the most time-consuming part of the entire training process.

The impact is dramatic. After incorporating the negative examples, the agent’s content has a qualitatively different character. It starts avoiding the generic constructions without being told to on a rule-by-rule basis. It seems to internalize the principle behind the rules, not just the rules themselves.

This is where the read rate takes its biggest jump, into the 60s.

The Feedback Loop

The next step is to formalize the feedback process. Every piece of content the agent drafts goes through a structured review:

First pass: Does the opening hook make you want to keep reading? Yes/no. If no, why?
Second pass: Highlight any sentence or paragraph that feels “off-brand.” Explain why in one sentence.
Third pass: Highlight any section that feels genuinely good, better than what a human would have written on a first draft.

The reviews take about ten minutes per piece. The agent receives the feedback and incorporates it into its working memory for future drafts.

Three patterns emerge from the feedback data:

Pattern 1: The agent defaults to safety under uncertainty. When it isn’t sure how to make a point, it reverts to generic, hedge-filled language. This is the single most common off-brand flag. The fix is a meta-rule: “When uncertain, be specific and concrete. Never retreat to generality.”

Pattern 2: It over-rotates on data. After data-driven writing gets emphasized, the agent starts cramming statistics into every paragraph, even when an anecdote would be more effective. A rule goes in: “Data supports stories. Stories carry meaning. Lead with the story.”

Pattern 3: It struggles with humor and self-deprecation. A team’s natural voice often includes a dry wit, not jokes, but a willingness to be slightly irreverent about its own mistakes. The agent either avoids humor entirely (safe but flat) or attempts it in ways that feel forced. The better move is to treat this as a human-only element and stop asking the agent to be funny. Instead, mark specific spots where a human editor can add voice.

The read rate climbs into the 80s.

The Anti-Pattern Library

The next step is to build what you might call the anti-pattern library. It’s a living document of specific constructions that the agent should never use, paired with preferred alternatives.

A few examples from the library:

Anti-Pattern	Why It’s Bad	Preferred Alternative
”In today’s [adjective] landscape…”	Dead opening. Signals generic content.	Start with a specific moment, metric, or decision.
”It’s worth noting that…”	Hedge. If it’s worth noting, just note it.	Delete the phrase. State the thing directly.
”[Thing] is a powerful tool for [outcome]“	Empty. Everything is a “powerful tool.”	Show the thing producing the outcome with specific data.
”This enables teams to…”	Marketing voice. Passive. Abstract.	”We used this to [specific outcome]. The result: [data]."
"Key takeaways include…”	Patronizing. Readers can identify takeaways.	End with a forward-looking claim or question.

The library grows continuously. Every time a reviewer flags an off-brand construction, it gets added to the library with the correction.

This is the least glamorous part of the process and, in retrospect, possibly the most valuable. The anti-pattern library is essentially a formalized version of editorial judgment. It captures the specific decisions that make writing good versus mediocre, and it does so in a format that an agent can operationalize.

The read rate closes in on the 90s.

Where This Lands

With months of iteration, the content agent produces first drafts that require minimal editing for voice. The structural editing, argument flow, evidence selection, section ordering, sometimes needs human adjustment. But the voice is right.

The internal read rate settles in the 90s. The remaining sliver tends to be people who don’t read blog content regardless of quality, which is fair.

Here’s what a before and after looks like. Same topic, same data points, months apart.

Before (low read rate): “AI agents are transforming business operations by automating routine tasks and enabling teams to focus on higher-value activities. Our data shows that organizations implementing agent-based workflows see significant improvements in operational efficiency, with some reporting up to 40% reduction in time spent on repetitive processes.”

After (high read rate): “We automated our weekly ops review. It used to take four people 90 minutes every Monday. Now it takes zero people zero minutes, the observability agent compiles the report overnight, the team intelligence agent flags anything that needs discussion, and by Monday morning there’s a brief waiting in Slack. Three months of data: the team reclaimed 78 hours. The meeting never came back.”

Same underlying message. Completely different voice. The second version is specific, concrete, grounded in real experience, and honest about the scope (it’s a weekly meeting, not a transformation of civilization).

What Actually Mattered

Ranking the interventions by impact:

Negative examples with annotations. The single biggest factor. Showing the agent what not to write is more powerful than showing it what to write. Constraints appear to be more learnable than aspirations.
Operational voice rules. The specific, testable rules give the agent a framework. Not subjective descriptions of voice, but concrete patterns.
The anti-pattern library. Incremental but compounding. Each new entry prevents a class of bad output forever.
The feedback loop. Important for catching drift and edge cases, but the big gains come from the structured training, not the iterative feedback.

The example corpus, the thing most teams start with and many stop at, contributes almost nothing on its own. Examples without annotation are data without labels. The agent can pattern-match on examples, but it can’t extract the principles behind the patterns without explicit guidance.

The Uncomfortable Truth About AI Content

Here’s the uncomfortable truth that most AI content discussions avoid: the hard part isn’t generating text. Language models could generate thousands of words of coherent prose years ago. The hard part is generating text that a specific human would want to read.

That’s a voice problem, not a generation problem. And voice is not a feature you can download or configure. It’s the accumulation of thousands of small editorial decisions, formalized into rules, examples, and anti-patterns, and refined through months of feedback.

Industry research consistently points to maintaining brand voice as the top challenge for teams using AI for content, above accuracy, above SEO, above volume. And to the fact that few organizations have a formalized voice training process for their AI tools.

The teams that don’t formalize their voice training are the ones producing content that sounds like every other AI-generated blog post on the internet. And their audiences can tell. Perceived authenticity, whether content feels like it was written by someone with a perspective, tends to be the strongest predictor of engagement, ahead of topic relevance, SEO optimization, and content length.

What We’d Do Differently

Knowing how it ends, you’d skip the first stage entirely. Don’t start with examples alone. Start with the voice rules and negative examples from day one. The time spent trying to train the agent on examples alone is wasted time.

You’d also invest more in the annotation process upfront. The annotated paragraph pairs cost relatively little time and deliver the biggest single improvement. Starting with a larger batch of annotated pairs can reach high read rates much sooner.

And you’d accept earlier that some elements of voice are human-only. The dry humor, the unexpected metaphor, the perfectly timed self-deprecation, these are the fraction of voice that the agent can’t learn. Rather than chasing perfection, build the human editing step into the workflow from the start and let the agent own the majority it does well.

The content agent doesn’t write the content. It drafts the content in the team’s voice, and a human makes it sing. That division of labor, machine for consistency and scale, human for spark, is the sustainable model. Anyone telling you otherwise is selling you something that doesn’t exist yet.

Get essays on AI operations, subscribe to the Apollo Space blog

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist