Evaluation & Safety◆ ClaudeDevelopersCTOs
Interpretability
Also: mechanistic interpretability
Understanding the internal computations of neural networks — a research frontier at Anthropic aimed at understanding what's actually happening inside models.