AI Codex
Foundation Models & LLMsCore Definition

When a well-crafted prompt isn't enough

Fine-tuning is how you train a model on your specific data to change its behavior at a deeper level than prompting can reach. It's powerful — and often unnecessary. Knowing which situation you're in saves a lot of time.

5 min read·Fine-tuning

Most Claude applications never need fine-tuning. A good system prompt, well-structured context, and maybe some few-shot examples will get you 90% of the way to what you want.

But there's a class of problems where prompting hits a ceiling — and fine-tuning is the answer. Understanding the difference is how you avoid a lot of wasted effort in both directions.

What fine-tuning actually does

Fine-tuning continues the training process on a new, curated dataset. Instead of training from scratch on the broad internet, you train on examples that demonstrate exactly the behavior you want.

This is different from prompting in a fundamental way. Prompting shapes what Claude does with a specific input at inference time. Fine-tuning changes the weights — the underlying parameters — so that the desired behavior is baked in rather than instructed.

The result is a model that has genuinely learned a new pattern, not one that's been told how to behave.

When prompting is enough

Before reaching for fine-tuning, it's worth being honest about whether you've fully exhausted prompting approaches:

  • Have you written detailed, specific instructions with examples?
  • Have you included several few-shot examples in your system prompt?
  • Have you tested different phrasings and structures?
  • Have you tried decomposing the task into clearer steps?

For most use cases — content generation, Q&A, summarization, extraction, classification — a well-crafted prompt with examples will match fine-tuned performance. Prompting is faster, cheaper, and easier to iterate on.

When fine-tuning earns its cost

Fine-tuning pays off when:

You need consistent style that prompting can't hold. If your brand voice is highly specific — particular cadence, vocabulary choices, structural patterns — and you need it perfectly consistent across thousands of outputs, fine-tuning learns the style in a way prompting approximates.

You have a narrow, high-volume task. If you're running the same extraction or classification task millions of times, a fine-tuned model is faster and cheaper per inference than a large model with a long system prompt.

You need to compress a long prompt. Complex instructions take tokens. A fine-tuned model can learn behaviors that would otherwise require hundreds of tokens of prompting, which matters at scale.

You have labeled data and a clear ground truth. Fine-tuning without good training data makes things worse, not better. If you have a clean dataset of input-output pairs that represent exactly what you want, you have the raw material for effective fine-tuning.

What fine-tuning can't do

Fine-tuning doesn't add new knowledge. It shapes behavior. If you want the model to know about your company's products, your processes, or your customers, that's a job for RAG — not fine-tuning. Trying to bake knowledge into a fine-tuned model is inefficient and the knowledge gets stale.

Fine-tuning also doesn't fix fundamental model limitations. If Claude makes reasoning errors on a class of problems, fine-tuning on more examples of that problem type may help marginally, but it won't change the underlying capability.

The practical path

If you're unsure whether you need fine-tuning, start with prompting. Build a good eval set. Measure where you're falling short. Only move to fine-tuning when you have clear evidence that prompting is the bottleneck — not a theory that it might be.

The pattern you'll often find: prompting with good examples gets you close. Fine-tuning gets you the last mile. Both are worth having in your toolkit, but prompting almost always comes first.