AI Codex
Evaluation & Safety ClaudeDevelopersCTOs

Interpretability

Also: mechanistic interpretability

Understanding the internal computations of neural networks — a research frontier at Anthropic aimed at understanding what's actually happening inside models.