AI Codex
Foundation Models & LLMsCore Definition

When a well-crafted prompt isn't enough

In brief

Fine-tuning is how you train a model on your specific data to change its behavior at a deeper level than prompting can reach. It's powerful — and often unnecessary. Knowing which situation you're in saves a lot of time.

5 min read·Fine-tuning

Contents

Sign in to save

Most Claude applications never need fine-tuning. A good system prompt, well-structured context, and maybe some few-shot examples will get you 90% of the way to what you want.

But there's a class of problems where prompting hits a ceiling — and fine-tuning is the answer. Understanding the difference is how you avoid a lot of wasted effort in both directions.

What fine-tuning actually does

Fine-tuning continues the training process on a new, curated dataset. Instead of training from scratch on the broad internet, you train on examples that demonstrate exactly the behavior you want.

This is different from prompting in a fundamental way. Prompting shapes what Claude does with a specific input at inference time. Fine-tuning changes the weights — the underlying parameters — so that the desired behavior is baked in rather than instructed.

The result is a model that has genuinely learned a new pattern, not one that's been told how to behave.

When prompting is enough

Before reaching for fine-tuning, it's worth being honest about whether you've fully exhausted prompting approaches:

  • Have you written detailed, specific instructions with examples?
  • Have you included several few-shot examples in your system prompt?
  • Have you tested different phrasings and structures?
  • Have you tried decomposing the task into clearer steps?

For most use cases — content generation, Q&A, summarization, extraction, classification — a well-crafted prompt with examples will match fine-tuned performance. Prompting is faster, cheaper, and easier to iterate on.

When fine-tuning earns its cost

Fine-tuning pays off when:

You need consistent style that prompting can't hold. If your brand voice is highly specific — particular cadence, vocabulary choices, structural patterns — and you need it perfectly consistent across thousands of outputs, fine-tuning learns the style in a way prompting approximates.

You have a narrow, high-volume task. If you're running the same extraction or classification task millions of times, a fine-tuned model is faster and cheaper per inference than a large model with a long system prompt.

You need to compress a long prompt. Complex instructions take tokens. A fine-tuned model can learn behaviors that would otherwise require hundreds of tokens of prompting, which matters at scale.

You have labeled data and a clear ground truth. Fine-tuning without good training data makes things worse, not better. If you have a clean dataset of input-output pairs that represent exactly what you want, you have the raw material for effective fine-tuning.

What fine-tuning can't do

Fine-tuning doesn't add new knowledge. It shapes behavior. If you want the model to know about your company's products, your processes, or your customers, that's a job for RAG — not fine-tuning. Trying to bake knowledge into a fine-tuned model is inefficient and the knowledge gets stale.

Fine-tuning also doesn't fix fundamental model limitations. If Claude makes reasoning errors on a class of problems, fine-tuning on more examples of that problem type may help marginally, but it won't change the underlying capability.

The practical path

If you're unsure whether you need fine-tuning, start with prompting. Build a good eval set. Measure where you're falling short. Only move to fine-tuning when you have clear evidence that prompting is the bottleneck — not a theory that it might be.

The pattern you'll often find: prompting with good examples gets you close. Fine-tuning gets you the last mile. Both are worth having in your toolkit, but prompting almost always comes first.

Further reading

Weekly brief

For people actually using Claude at work.

Each week: one thing Claude can do in your work that most people haven't figured out yet — plus the failure modes to avoid. No tutorials. No hype.

No spam. Unsubscribe anytime.

What to read next

All articles →