Taste over prompts: the real skill behind great AI products
Everyone is learning prompt engineering. But the real differentiator isn't how you talk to the model, it's knowing what to build and what to leave out. The gap between 'technically works' and 'delights users' is taste.
Apollo Space Research
Apollo Space
The Notification That Nobody Asked For
Our content agent used to be verbose. It would finish analyzing a competitor’s blog post and produce a report that looked like this:
Competitor Analysis: Acme Corp Blog Update
- New post published: “5 Ways to Improve Your Sales Pipeline” (March 1, 2026)
- Word count: 2,340 words
- Reading level: Grade 8.2
- Keyword density: “sales pipeline” (2.3%), “CRM” (1.8%), “automation” (1.4%)
- Estimated organic traffic potential: 2,400-3,800 monthly visits
- Social shares at time of analysis: LinkedIn (34), Twitter (12)
- Author: Sarah Chen, VP Marketing
- Sentiment: Positive/Educational
- Topics covered: pipeline management, CRM hygiene, follow-up cadence, lead scoring, automation tools
- Competitive positioning: Does not mention our category directly
- Recommended action: Consider creating a response piece targeting similar keywords with a differentiation angle
Every data point was accurate. Every metric was real. The recommended action was reasonable. And absolutely nobody read past the third line.
We know this because we tracked this kind of output. Verbose reports like this tend to see full-read rates in the low teens. Most of the time, the person receiving the report glances at the title and moves on. The agent was producing correct output that was functionally useless.
The fix wasn’t a better prompt. The fix was taste.
We rewrote the agent’s output to look like this:
Acme published on sales pipelines. Standard content, nothing differentiated. No mention of AI agents. Our positioning is safe. No action needed.
Full-read rates jump dramatically. Same underlying analysis. Same data. Radically different presentation, because someone (painfully, over several iterations) made judgment calls about what mattered and what didn’t.
That’s taste. And no prompt engineering course teaches it.
The Prompt Engineering Industrial Complex
Here’s something unpopular: prompt engineering is overrated.
Not useless, overrated. The gap between the median prompt engineer and the top-1% prompt engineer is real but shrinking. Models are getting better at interpreting mediocre prompts. Prompt optimization tools are automating the mechanical parts. The shelf life of a specific prompting technique is measured in months before the next model version renders it unnecessary.
In 2024, “chain of thought” was a revelation. In 2025, models do it by default. In 2024, “few-shot examples” were a critical skill. In 2025, models need fewer examples to generalize. The technical skill of structuring prompts is on a treadmill that keeps accelerating.
Meanwhile, the AI product landscape is drowning in technically competent, aesthetically dead products. Tools that work correctly but feel wrong. Agents that produce accurate output that nobody reads. Dashboards that surface every metric because nobody had the courage to choose which three matter.
Online learning platforms have seen “prompt engineering” courses multiply. Professional profiles by the thousands now list “prompt engineering” as a skill. The prompting guides from the major AI labs are among the most-consumed technical content on the internet.
Nobody is teaching taste.
What Taste Means in AI Products
Taste isn’t subjective preference dressed up as a skill. It’s a specific form of judgment that manifests in concrete decisions:
Knowing what to leave out. Every piece of information an agent surfaces is a decision it’s forcing on the user. “Your competitor published a blog post” forces the user to decide: is this important? “Your competitor published on a topic directly competing with your main product page” is a decision the agent already made, surfacing only what requires human attention.
Leaving things out is harder than including them. Including everything is the safe choice, you can’t be blamed for showing too much. Leaving things out requires confidence that your judgment about what matters is correct. That confidence comes from understanding your user deeply, not from engineering a better prompt.
Knowing when to stay silent. The worst AI products are the loudest. They notify you about everything because they can’t distinguish signal from noise. Every notification is equally weighted because the product doesn’t have an opinion about what’s important.
Apollo Space’s meeting digest agent processes every meeting. But it doesn’t send a report for every meeting. If a meeting consisted of routine status updates with no decisions made, no action items generated, and no significant disagreements, the agent stays silent. Nothing happened that requires attention. The absence of a notification is itself a signal: “This meeting was routine. You don’t need to think about it.”
Building this required us to define what “nothing noteworthy” looks like, a judgment call that no prompt can make because it depends on organizational context, the attendees’ roles, and the current priorities of the business.
Knowing the right resolution. How much detail should an agent provide? The answer is: it depends. A deal intelligence brief for a $5K prospect should be a paragraph. A brief for a $500K prospect should be a page. A brief for a $5M prospect should be a comprehensive dossier.
This is a taste decision, not a technical one. The agent could produce the same level of detail for every prospect. Choosing to scale detail with stakes is a product decision that reflects understanding of how the sales team actually works.
Knowing the right tone. An agent escalating a routine question should be casual. An agent flagging a potential security vulnerability should be urgent. An agent delivering good news should be concise (nobody needs a detailed explanation of why things went well). An agent delivering bad news should provide context (people need to understand what went wrong and what to do about it).
Tone in AI products isn’t about making the agent “sound human.” It’s about matching the communication style to the context. A security alert delivered in a breezy, conversational tone is jarring. A routine status update delivered with urgency and gravity is exhausting.
The Content Agent Rewrite
Let’s walk through the specific taste decisions we made when rebuilding Apollo Space’s content agent. Not the prompts, the product decisions.
Decision 1: Reports should have a verdict, not just data.
The first version of the content agent presented data and let the user decide what it meant. This is the default instinct of engineers: give people the raw information and let them draw conclusions.
But that’s not what busy operators need. They need a conclusion. “Competitor X published something interesting” or “Competitor X published something irrelevant.” The verdict might be wrong sometimes, that’s what feedback loops are for. But a wrong verdict that can be corrected is more valuable than accurate data that requires interpretation.
We changed the content agent’s output structure from Data -> Analysis -> Recommendation to Verdict -> Supporting Evidence (if needed) -> Recommended Action (if any).
The verdict comes first. If the user agrees, they’re done in 5 seconds. If they want to verify, the supporting evidence is there. If they disagree, they provide feedback that improves future verdicts.
Decision 2: Negative findings are more valuable than positive ones.
The first version reported everything: new blog posts, social media updates, pricing changes, job postings, product announcements. Equal weight to all.
But in practice, certain findings are disproportionately actionable. A competitor cutting prices is urgent. A competitor publishing a blog post is usually not. A competitor posting a job for “VP of AI” is interesting. A competitor posting for a “Junior Designer” is noise.
We built a significance filter based on competitive impact. The agent evaluates each finding against the question: “Would a smart competitive strategist care about this?” If the answer is probably not, the finding is logged but not surfaced. The user can access the full log if they want, but their default view shows only significant findings.
This required us to define “significant,” which is a taste judgment. We got it wrong several times. Early versions filtered too aggressively (let a competitor’s pivot into the market slip through). Later versions filtered too loosely (surfaced every social media post). The current calibration is the result of months of feedback loops and iteration.
Decision 3: The agent should adapt to the user’s consumption pattern.
Some users read every report in detail. Others skim headlines. Others only engage when something is flagged as high-priority. The content agent should adapt its output format to each user’s pattern.
For detail-oriented users: full reports with supporting data. For skimmers: bullet-point summaries with bold verdicts. For priority-only users: silence unless something is genuinely important, then a concise alert.
This wasn’t a prompt engineering challenge. It was a product design challenge. We had to build user behavior tracking, define consumption archetypes, and design three different output formats, then let the agent learn which format each user prefers based on engagement patterns.
The prompt for generating the report barely changed. The product around the prompt changed completely.
Why Taste Can’t Be Automated
Here’s the uncomfortable truth for the “AI will automate everything” crowd: taste is specifically the thing that can’t be automated.
You can automate prompt optimization. You can automate A/B testing of output formats. You can automate metric collection and analysis. But you can’t automate the decision about which metrics matter. You can’t automate the judgment about when an agent should speak up and when it should stay quiet. You can’t automate the sense of “this feels wrong” that leads you to redesign an interaction pattern even though the data says it’s working.
Taste is the product of accumulated experience, empathy for users, and the willingness to have strong opinions about what good looks like. It’s Steve Jobs deciding that the iPod should have one button when every competitor had twelve. It’s Dieter Rams deciding that a radio should look like furniture, not technology. It’s the Basecamp team deciding that their project management tool should do less than every competitor, not more.
In AI products, taste manifests as the courage to make your agent do less. To filter more aggressively. To stay silent more often. To present a verdict instead of a data dump. To design for the user’s attention as a scarce resource, not a infinite one.
Every mediocre AI product we’ve tried fails the same way: it does everything the model is capable of, rather than the subset the user actually needs. The technical capability is impressive. The product experience is exhausting.
The Taste Stack
If we had to formalize taste into a development process (which somewhat defeats the point, but bear with us), it would look like this:
Layer 1: Understand the user’s actual job. Not the job title. The actual daily workflow. What does the sales director look at first thing in the morning? What does she worry about? What information does she already have too much of? What information does she wish someone would just tell her?
This is ethnographic research, not user stories. It requires watching people work, not asking them what they want. People can’t articulate what they need from a product that doesn’t exist yet. But you can observe what frustrates them about current workflows and infer what an ideal product would eliminate.
Layer 2: Define the agent’s role in the user’s life. Is this agent a trusted advisor who proactively surfaces insights? A reliable assistant who handles routine tasks? A specialist who’s consulted for specific decisions? The role determines everything: communication frequency, level of detail, tone, initiative level.
An advisor speaks up unprompted with opinions. An assistant responds to requests and reports on task completion. A specialist waits to be asked and provides deep analysis. Most AI products don’t make this choice, they try to be all three simultaneously, which is exhausting for the user and confusing for the agent.
Layer 3: Design the information hierarchy. Not all outputs are equal. Define what’s critical (interrupts the user immediately), important (included in the next scheduled report), notable (logged for reference), and ignorable (filtered out entirely). This hierarchy reflects your opinion about what matters, and having that opinion is the essence of taste.
Layer 4: Iterate on feel. This is the part that can’t be systematized. Use the product. Use it every day. Notice what annoys you, what surprises you, what you wish was different. Make changes based on those observations. Use it again. The feedback loop between using the product and improving the product is where taste develops.
We use Apollo Space’s agents daily. Not as a demo. As our actual workflow. When the meeting digest agent sends something we don’t care about, we feel it as users, not as builders. That feeling, the micro-frustration of unwanted information, drives product changes that no metric would surface.
The Moat You Can’t Copy
There’s a strategic argument for taste that goes beyond user experience: taste is a moat.
Prompts can be copied. Architecture can be reverse-engineered. Features can be replicated. But taste, the accumulated product judgment that results in a hundred small decisions about what to include, exclude, emphasize, and de-emphasize, can’t be copied because it isn’t a thing. It’s the residue of thousands of hours of using, observing, and iterating on a product.
A competitor can see that Apollo Space’s content agent delivers verdicts instead of data dumps. They can copy that feature. But they can’t copy the calibration of what constitutes a “significant” finding, because that calibration is the product of months of feedback from real users in real workflows. They can copy the three output formats (detailed, summary, priority-only), but they can’t copy the logic that assigns each user to the right format, because that logic embodies our understanding of how different operators consume information.
Every taste decision is individually small and collectively irreplicable. It’s what separates the products that people tolerate from the products that people love.
The Gap Is Widening
Here’s our bet: as AI models improve, the technical gap between AI products will shrink and the taste gap will widen.
When models were primitive, technical skill was the differentiator. Getting GPT-3 to produce useful output required genuine prompt engineering expertise. The best AI products in 2022-2023 were the ones with the best engineers.
As models become more capable, and they’re becoming more capable rapidly, the marginal return on technical optimization decreases. The difference between a good prompt and a great prompt on GPT-4 was significant. The same difference on models in 2026 is smaller. The models are filling in the gaps that humans used to engineer around.
What’s left when the technical gap closes? Product judgment. The decisions about what to build, how to present it, when to surface it, and when to stay quiet. The decisions that require understanding humans, not understanding models.
The AI product landscape is about to bifurcate. On one side: technically competent products that feel like spreadsheets with chatbots, accurate, complete, exhausting. On the other side: products with taste that feel like they were designed by someone who actually uses them, opinionated, selective, delightful.
The first category will compete on price and features. The second will compete on love.
We know which side we want Apollo Space to be on. And we know that getting there isn’t a matter of engineering better prompts. It’s a matter of making better choices about what our agents should and shouldn’t do, and having the conviction to ship those choices even when the data is ambiguous and the safe option is to include everything.
Taste over prompts. Always.
Follow our product thinking, substance over hype
Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.
Join the waitlistThe slow death of a marketer's voice
You publish one real piece a week and quietly translate it into ten, and each translation is a tiny chance to sound a little less like yourself. We built the OS because nothing on the market was guarding that.
Product ThinkingThe day someone quits, your company forgets how it works
Onboarding isn't broken because training is bad. It's broken because your company can't remember, and we got tired of watching the answer walk out the door.
Product ThinkingThe first thing a new hire should do is read the company
A great onboarding doesn't hand you docs, it already knows who you are by the time you log in.