You built a RAG system. The demo was magic. Then real users showed up, and it started doing the most dangerous thing an AI system can do: answering confidently, and being wrong.
Not hallucinating-from-nothing wrong. Worse — retrieved-the-wrong-source wrong. The model grounded its answer in a real chunk from your real documents. It just grounded it in the wrong one. And because the answer is fluent and citation-backed, nobody notices until it costs something.
If this sounds familiar, the problem usually isn’t your model, your prompt, or your chunk size. It’s that vector similarity is not correctness.
Similar is not the same as correct
Vector search does exactly one thing well: it finds the chunks whose embeddings are closest to your query. That’s it. “Closest in embedding space” is a decent proxy for relevance — until it isn’t.
Consider a query like “What’s the renewal policy for enterprise customers?” Your retriever happily returns the chunk about consumer renewals, because the language is 95% identical. It’s semantically similar. It’s also wrong, and the one word that mattered — enterprise vs consumer — got averaged into the embedding and lost.
This is structural, not a tuning problem. Embeddings compress meaning into proximity. But a lot of real-world correctness lives in relationships and constraints that proximity can’t represent:
- Enterprise renewals are governed by a different policy than consumer renewals. (a hierarchy)
- “Series A” means something different in funding vs bonds. (disambiguation)
- Drug A contraindicates condition B. (a typed relationship, not a vibe)
Vector search has no idea any of these relationships exist. It only knows what reads alike. As one widely-shared take put it: standard RAG retrieval surfaces what is most semantically similar to the query, not necessarily what is most complete or correct — a fatal distinction in high-stakes contexts.
The industry already agreed on the fix: structure
This is why 2026 became the year of GraphRAG. Instead of retrieving from an undifferentiated soup of chunks, you retrieve over a knowledge graph: entities, their types, and the typed relationships between them. Now “enterprise customer” is a node with an explicit governed_by → EnterpriseRenewalPolicy edge. Retrieval can follow meaning, not just proximity.
But here’s the part most “add a knowledge graph to your RAG” tutorials skip:
A knowledge graph is only as good as the ontology underneath it.
The ontology is the schema of meaning — the definition of what an EnterpriseCustomer is, what types exist, which relationships are valid, what the controlled vocabulary is. Without it, your graph is just a different-shaped pile of unvalidated triples. The prerequisite for GraphRAG that actually works is a carefully curated taxonomy and ontology. That’s the unglamorous work that makes the magic reliable.
So why doesn’t everyone just build the ontology?
Because the moment an engineer goes looking for how to manage an ontology, they hit a wall of bad options:
Option 1: Hand-author OWL/RDF in Protégé. Powerful, rigorous, and a genuine PhD-grade experience. It’s a desktop tool from the academic semantic-web world. It is not how a team that ships every week wants to collaborate, and it certainly isn’t API-first.
Option 2: A spreadsheet. Where most teams actually start. It works — until you need versioning, until two people edit it, until a downstream pipeline breaks because someone renamed a concept and there’s no diff, no history, no API. Silent concept drift, shipped to production.
Option 3: An enterprise governance suite. Collibra, PoolParty, Atlan. These are real and capable — and they’re built for enterprise governance committees, with contracts that need board approval and deployment timelines measured in months. That’s not a tool you reach for to make this sprint’s GraphRAG retrieve correctly.
So the engineer who just wants their RAG to stop being confidently wrong is stuck between a desktop tool from 2005, a spreadsheet that doesn’t scale, and a contract that needs sign-off. Most pick the spreadsheet and hope.
Treat your ontology like you treat your code
Here’s the shift that makes this tractable: stop treating your ontology as a document, and start treating it like code.
Your codebase is complex too. You don’t manage it in a spreadsheet or a six-figure governance suite. You manage it with version control, diffs, reviewable changes, and an API. Every change is attributable and reversible. You can tag a release, compare two versions, and roll back a mistake before it reaches production.
An ontology deserves exactly the same treatment, because it has exactly the same failure mode: a small unreviewed change silently breaks something downstream. Rename a concept, and every pipeline, retriever, and export that depended on it drifts. The fix isn’t more governance ceremony — it’s the same primitives engineers already trust:
- Versioning — every change reversible and attributable.
- Diffs — see exactly what changed between two versions before it ships.
- Snapshots / tags — pin a known-good ontology version your pipeline depends on.
- An API — so your retriever, your dbt models, and your LangChain app all read from one source of truth instead of a copied-and-pasted CSV.
When your ontology is versioned like code and exposed through a stable API, “concept drift across pipelines” stops being a class of bug. Everything reads from one source of truth, and you can prove what it looked like on any given day.
What this looks like concretely
A minimal, useful ontology for the renewal example isn’t a research artifact — it’s a handful of well-defined concepts:
Customer
├─ EnterpriseCustomer governed_by → EnterpriseRenewalPolicy
└─ ConsumerCustomer governed_by → ConsumerRenewalPolicy
RenewalPolicy
├─ EnterpriseRenewalPolicy
└─ ConsumerRenewalPolicy
That’s it. With this in place, GraphRAG can resolve “enterprise renewals” to the right policy node and refuse the consumer chunk — because the relationship is explicit, not inferred from word overlap. You don’t need a PhD to author that. You need a place to define it, version it, and serve it to your stack. Defined once, exported as SKOS or JSON-LD, and pulled live by your retriever — the wrong-chunk bug class disappears.
The takeaway
If your RAG is confidently wrong, don’t start by swapping the embedding model. Start by asking whether your system has any notion of structured meaning at all — types, hierarchies, and typed relationships — or whether it’s relying entirely on “these words look alike.”
GraphRAG is the answer the industry converged on. But GraphRAG without a curated, versioned ontology is just a more complicated way to be wrong. The ontology is the part that makes it trustworthy — and it no longer requires a PhD, a spreadsheet you’ll regret, or a contract that needs board approval.