Question 1

What is Interpretability?

Accepted Answer

The research challenge of understanding what's actually happening inside an AI model — which internal components activate for which inputs, and why the model makes specific decisions. Different from explainability (getting Claude to explain its output in words): interpretability is about mechanically understanding the model's internals. Anthropic has a dedicated interpretability research team and publishes findings publicly. It's foundational for making AI systems trustworthy at scale.

Question 2

What is another name for Interpretability?

Accepted Answer

Interpretability is also known as: mechanistic interpretability.