AI Codex
Retrieval & KnowledgeCore Definition

How to give Claude a memory it doesn't have by default

RAG is the most practical technique in AI engineering — and the most misnamed. It's not magic. It's just giving the model the right pages of the book before it answers.

5 min read·RAG

Imagine you asked a brilliant friend a question about your company's internal processes. Your friend is smart, but they've never worked at your company. They don't know your procedures, your product, your customers, or your history.

Now imagine you handed them the relevant page from your internal wiki before asking. Suddenly they can give you a specific, accurate, useful answer — not a generic one.

That's RAG. Retrieval-Augmented Generation. The name sounds technical. The idea is simple.

The problem it solves

Claude — like all language models — was trained on a fixed dataset with a knowledge cutoff. It knows a lot about the world up to that point, but it doesn't know:

  • Your company's internal documents
  • Your product's current pricing
  • What happened last week
  • Anything proprietary or private

RAG is the standard solution. Instead of trying to train the model on your data (expensive, slow, not always possible), you retrieve relevant information at query time and include it in the context window. Claude reads it, reasons over it, answers based on it.

Fresh, specific, accurate — without retraining.

How it works in practice

A RAG system has two parts:

The knowledge base. Your documents, broken into smaller pieces (paragraphs, sections, pages) and stored in a vector database — a type of database that understands semantic similarity, not just keyword matching.

The retrieval step. When a user asks a question, the system finds the most semantically similar chunks from your knowledge base and pulls them out.

Those chunks get assembled into a prompt — "Here is relevant context: [retrieved chunks]. Now answer this question: [user question]" — and sent to Claude. Claude reads the context, reasons over it, and answers. The user gets a response grounded in your actual data, with citations if you want them.

Why Claude is particularly good at this

RAG quality depends on two things: finding the right chunks (retrieval) and doing something useful with them (reasoning). Most improvement happens at the reasoning layer, and that's where Claude shines.

Claude's 200,000-token context window means you can retrieve generously — 20 chunks instead of 3 — without worrying about overflow. More context means fewer situations where the right answer was in chunk 4 but you only retrieved 3.

Claude also handles conflicting or incomplete information in retrieved context well. Rather than hallucinating a confident answer when the context is ambiguous, Claude tends to surface the uncertainty explicitly. For enterprise applications where accuracy matters, this is worth a lot.

The one thing people get wrong

RAG gets blamed for retrieval failures that are actually document quality failures.

If your internal docs are inconsistent, outdated, or poorly written, retrieval will find the wrong chunks — not because the system is broken, but because the signal in your documents is weak. The best RAG improvement you can make is often editing the source material, not tweaking retrieval parameters.

Clean docs lead to good retrieval, which leads to good answers. Garbage in, garbage out.

When to use it

RAG is the right tool when:

  • You have a body of knowledge Claude wasn't trained on
  • Your information changes frequently (RAG updates instantly; retraining doesn't)
  • You need Claude to cite specific sources
  • You're building a product that needs to stay current with your data

If the information Claude needs is already in its training data — general knowledge, coding, writing — RAG adds complexity without much benefit. Use it when you have something specific to say.


Further reading