Engineering

Your vector search is the quietest tenant leak in your stack

You can lock every table and still leak across orgs at the embedding layer.

ASR

Apollo Space Research

Apollo Space

April 10, 2026 · 10 min read

An agent serving one customer asks a plain question, “what did we agree about renewals last quarter?”, and the retrieval layer answers with a passage. The passage is relevant. The passage is well-formed. The passage belongs to a different company. No error fired. No alarm rang. The log shows a successful query returning a high-similarity match, which is exactly what you asked it to do.

That is the leak. It does not look like a breach. It looks like a good search result.

You can lock every table and still leak across orgs at the embedding layer.

This post is about why the place you most carefully secured, your relational data, is not the place your multi-tenant boundary actually breaks, and what it takes to close the gap your row-level rules never covered.

The boundary you built, and the door you left open

The standard way you isolate tenants is well understood and, for the database, genuinely solid. Every row carries an org id. Every query filters on it. Row-level security enforces the filter at the engine, so even a query that forgets the WHERE clause comes back empty for the wrong tenant. You test it with a cross-org probe, connect as org A, ask for org B’s rows, get nothing, and the test passes. The relational boundary holds.

Then you add retrieval, because the agents need to remember things and search over documents. You embed the customer’s files, their notes, their past conversations. You drop the vectors into an index. And the index, by default, is one big shared space where similarity is the only thing that decides what comes back.

Here is the quiet part. Similarity does not know about tenants.

A nearest-neighbor search finds the closest vectors to your query vector, full stop. It does not care which org produced them. Two companies in the same industry write about the same things, the same contract clauses, the same product names, the same recurring problems, so their embeddings land near each other in the space. Ask for “our renewal terms” and the closest match might be yours, or it might be the company three rows over whose contract used nearly identical language. The math is working perfectly. It is just answering a question you did not mean to ask.

The database boundary is a wall. The vector index, left alone, is a wall with a door cut into it that no one drew on the floor plan.

Why this leak is so hard to see

The reason this survives in production is that every failure mode you have instincts for is absent.

A SQL injection throws errors and trips scanners. A missing auth check returns a 403 someone notices. A cross-tenant leak in your relational layer fails loudly the moment your probe test runs. You have built a whole career’s worth of reflexes around boundary failures that announce themselves.

The vector leak announces nothing. The query succeeds. The latency is normal. The result has a high similarity score, which your monitoring reads as a good retrieval, not a suspicious one. The agent, downstream, weaves the foreign passage into a fluent answer, because that is what it does with whatever retrieval hands it. The person reading the answer has no way to know one sentence came from someone else’s data. Neither does your error budget.

A relational leak fails like a break-in. A vector leak fails like a good search result.

So the leak does not show up where you look for leaks. It shows up as a slightly-too-knowledgeable answer, occasionally, to a question that touched a topic two customers had in common, which is the hardest possible signal to distinguish from your retrieval simply working well. By the time anyone suspects it, it has been answering questions for weeks.

The naive fix: filter the results after the search

The first instinct, once you see the problem, is the obvious one. Run the similarity search across the shared index, get the top matches back, and then drop any result whose org id does not match the caller. Filter after retrieval. Problem solved.

It is not solved, and the reason is worth slowing down on, because it is the trap most teams fall into first.

A nearest-neighbor index returns a fixed number of candidates, say the top ten closest vectors. If you ask for ten and then throw away the seven that belong to other tenants, you have not returned the tenant’s three best matches. You have returned whichever of the tenant’s matches happened to survive the cull, and you have silently dropped the rest of the result set on the floor. In a shared space where other tenants’ documents crowd the neighborhood, the tenant’s own most relevant passage might rank eleventh, past the cutoff, and never appear at all. You over-fetch to compensate, asking for the top hundred to be safe, and now you have made every query slower, leakier in transit, and still not guaranteed correct, because there is no fetch size that’s provably enough when you do not know how many foreign neighbors sit between the caller and their own data.

Post-filtering treats the boundary as cleanup. But a boundary you apply after the fact is a boundary the search already crossed. The vectors were compared. The neighbors were ranked. The foreign data was, for a moment, a candidate answer, and a moment is all a leak needs.

The fix is not to clean up after the search. It is to make sure the search never sees the other tenant’s vectors in the first place.

Our way: the org id is part of the question, not a filter on the answer

The boundary has to live inside the retrieval, not around it. The org is not a condition you check at the end. It is part of what “nearest” means.

There are a few honest ways to do this, and which one fits depends on scale, but they share one principle: a tenant’s query is only ever compared against that tenant’s vectors. The foreign data is not ranked and discarded. It is never in the candidate pool at all.

The cleanest version gives each tenant its own index, a separate vector space per org, so a search physically cannot reach another tenant’s embeddings because they are not in the same structure. There is nothing to filter because there is nothing foreign present. The cost is many small indexes instead of one big one, which is real operational weight, but the boundary becomes a property of where the data lives rather than a rule you remember to apply.

When that many indexes get expensive, the next honest design keeps one index but makes the org id a hard pre-condition of the search itself, the index is partitioned by tenant, and the query is scoped to its partition before a single distance is computed. The search runs within the tenant’s slice. The math never crosses the line, because the line is drawn before the math starts.

And underneath all of it, the same belt-and-braces discipline we use everywhere: the explicit org filter is also present, every time, even when the partition or the separate index should already make it impossible to cross. Two independent things both have to fail for a leak to happen. We do not trust one mechanism to be the whole boundary, because the entire lesson of this post is that the boundary you are most confident in is the one with the undrawn door.

How we prove it, instead of hoping

A boundary you cannot test is a boundary you do not have. This is the part that separates a design from a guarantee.

The relational probe everyone runs, connect as one tenant, ask for another’s rows, confirm you get nothing, has a vector twin, and we run it on purpose. Seed the index with two tenants’ documents written to be deliberately similar: same industry, same vocabulary, near-identical phrasing, so the embeddings sit right on top of each other. That is the worst case, the one where post-filtering quietly fails and the math most wants to hand you the wrong neighbor. Then query as one tenant and assert that nothing, not one passage, not one fragment, comes back from the other. If a single foreign vector surfaces, the boundary is broken, and we would rather learn that from a test than from a customer.

It runs as part of the same isolation suite as the relational probe, because to us they are the same requirement wearing two costumes. A tenant boundary that holds in the database and leaks in retrieval is not a boundary that mostly holds. It is a boundary that does not hold, tested in the one place that happened to pass.

The number that matters here is not a latency or a recall figure. It is zero, zero foreign passages returned in the adversarial cross-tenant case, every run, or the build does not go out. A leak rate that is “usually zero” is a leak rate. There is no acceptable amount of someone else’s data in your answer.

The turn: isolation is a promise you make to a person you will never meet

Somewhere downstream of all this is a person who will never read your architecture diagram and would not care if they did. They run a business. They put their contracts, their customer notes, their unfinished decisions into a tool because the tool promised to remember them, and the entire weight of that promise is that their data stays theirs. Not mostly. Not except in the rare similarity collision. Theirs.

That promise is not kept by the part of the system that is easy to reason about. It is kept in the part that fails silently, the search that returns a beautiful, relevant, fluent answer assembled, just this once, from the wrong company’s words. Nobody would catch it. That is exactly why it has to be impossible rather than unlikely. You cannot lean on the layer being well-behaved when the failure mode is invisible. You have to build it so the foreign data is never in the room.

We design Apollo so a tenant’s query is only ever compared against a tenant’s own vectors, separate space, scoped search, redundant filter, and an adversarial test that proves it on every build. Because you can lock every table and still leak across orgs at the embedding layer, and the customer who gets back someone else’s contract will never know the difference between a similarity collision and a betrayal. They only know they trusted you with the thing they could least afford to share.

That’s the standard we hold at Apollo Space, a boundary that holds in the loud places and the quiet ones, tested where it would fail without a sound. If you have ever shipped a multi-tenant system and felt a small cold question about the layer you secured last, this is the layer. It is worth one more look before someone you will never meet finds the door you didn’t draw.

Apollo runs your company's repetitive ops so your team doesn't.

Join the waitlist for early access, founding-user pricing, and a front-row seat as we ship.

Join the waitlist