RAG
Also: Retrieval-Augmented Generation, retrieval augmented generation
RAG (Retrieval-Augmented Generation) is an architecture where Claude retrieves relevant information from an external source before generating a response. Instead of relying purely on training data, Claude searches your documents, database, or knowledge base to find the right context, then uses that to answer. The result: Claude can answer questions about your specific products, policies, or data — things it couldn't have known from training alone.
In practice
You want Claude to answer questions about your 500-page internal policy manual without pasting the whole thing into every conversation. RAG stores the manual as searchable chunks. When a question comes in, the relevant chunks are retrieved and given to Claude along with the question. Claude answers from your actual documentation, not from training data guesses.
Related concepts
Where RAG shows up
6 articlesBuilding RAG is easy. Building RAG that doesn't silently degrade over time is hard. Here's the production-ready version — including the retrieval failures most tutorials don't mention.
Claude works best for research as a thinking partner, not a search engine. Here is how to use it effectively without being misled by confident-sounding errors.
Most teams jump to RAG because it sounds like the right answer. Half of them didn't need it. Here's how to know which situation you're in — before you build anything.
RAG is the most practical technique in AI engineering — and the most misnamed. It's not magic. It's just giving the model the right pages of the book before it answers.
RAG is one of the most powerful things you can build with Claude. It's also where a lot of teams get stuck. Here are the failure patterns worth knowing before you start.
New hires spend their first week confused and HR spends it answering the same 40 questions. Here is the workflow that fixes both — without building a chatbot.