AI Codex
Tools & EcosystemFailure Modes

Why RAG implementations fail (and how to avoid the most common mistakes)

RAG is one of the most powerful things you can build with Claude. It's also where a lot of teams get stuck. Here are the failure patterns worth knowing before you start.

5 min read·RAG

RAG — connecting Claude to your documents and data so it can answer questions grounded in real information — is genuinely transformative when it works. But the path from "let's build a RAG system" to "this reliably works in production" has some consistent failure points.

Here are the ones worth knowing before you start.

Failure 1: Retrieval brings back the wrong chunks

RAG works by splitting your documents into chunks, converting them into numerical vectors, and retrieving the chunks most similar to the user's question. The problem: "most similar" in vector space and "most relevant to answer this question" aren't always the same thing.

A user asks "What's our returns policy for international orders?" Your returns policy document has a section on international returns — but also a section on domestic returns, shipping terms, and exception cases. The retrieval system might surface all of those, or none of the right one.

Fix: Invest in the quality of your chunking strategy. Chunks should be semantically coherent — a complete thought, not a paragraph cut in the middle of a sentence. Test your retrieval system separately from your generation system. If what's coming back isn't right, better prompts won't help.

Failure 2: The documents are outdated or inconsistent

If your knowledge base contains both an old pricing doc and a new one, Claude might synthesise them into something that reflects neither accurately. RAG retrieves; it doesn't curate.

Fix: Treat your document corpus like a product. Someone owns it. Documents have owners, review dates, and a deprecation process. "Add documents to the RAG system and never look at them again" produces confidently wrong answers about anything that's changed.

Failure 3: Claude doesn't know what it doesn't know

Standard RAG: user asks question, system retrieves documents, Claude answers. The failure mode: the right document isn't in your corpus, so nothing relevant gets retrieved, and Claude answers from its general training data instead — without telling you that's what it's doing.

Fix: Instruct Claude explicitly to say when it doesn't have the relevant information in its provided context. "If the answer isn't in the documents provided, say so clearly. Don't answer from general knowledge." This is one of the most important instructions to include in a RAG system prompt.

Failure 4: Too much context, not enough signal

Retrieving more documents isn't always better. If your retrieval returns ten chunks and only one is actually relevant, Claude has to find the signal in the noise. On easy questions this works. On hard ones, the irrelevant context can actively mislead.

Fix: Quality of retrieval over quantity. Better to retrieve three highly relevant chunks than ten loosely related ones. Tune your similarity threshold rather than increasing the number of results.

The honest framing

RAG done well is one of the most valuable things you can build. RAG done quickly often looks impressive in demos and breaks in production. The investment is in the data pipeline, the retrieval quality, and the ongoing maintenance — not just the LLM integration.

If you're scoping a RAG project, budget time for document curation, retrieval testing, and iteration. Those are where the real work is.


Further reading