Understand and debug your AI model
There is remarkable mathematical structure and geometry within neural networks. We help you uncover the hidden representations inside your model to remove the guesswork from AI training - going from alchemy to precision engineering.







We believe that AI is the most consequential technology of our time, yet today we train models with remarkably little understanding of the nature of their intelligence.
We’re the research lab dedicated to creating the science and technology to change that.
Novel methods to understand,
debug, and design your AI model
Understand
Reverse engineer the causal mechanisms of AI to reveal its internal structure, uncovering novel science and validating when predictions reflect true understanding.
We identified a novel class of biomarkers for Alzheimer's detection by interpreting a epigenetic model, the first major finding in the natural sciences obtained from reverse-engineering a foundation model.

We decoded the internal representations of Arc Institute's Evo 2 genomic model, finding features that map onto biological concepts from coding sequences to protein secondary structure. Published in Nature.

We used Evo 2 embeddings to predict whether and how genetic variants cause disease, achieving state-of-the-art performance and interpretable-by-design predictions.
Debug
Precisely debug issues with model behavior, identify and remove confounders, and diagnose failures before they occur in production.
We tracked “performative chain-of-thought”: when models “know” their final answer but continue to generate chain-of-thought anyways. We showed that probes can enable early exit from reasoning traces, saving up to 68% of tokens with minimal accuracy loss.
We analyzed the latent space of a cardiac vision model to reveal when it had learned clinically meaningful structure rather than brittle shortcuts — revealing mid-layer activation instability in weaker variants, confirming robust temporal signal use, and showing anatomically grounded attention in the strongest model.

We worked with a robotics team to diagnose why some checkpoints produced unstable rollouts. By inspecting latent policy structure and representational geometry directly, we traced unstable behaviors to brittle internal features.
Design
Control training precisely to ensure your model learns what you want with less data and fewer off-target effects.
We cut hallucinations in an LLM by 58% by using interpretability to guide model training. Our approach was ~90x lower cost per intervention than LLM-as-judge, with no degradation in standard benchmarks.

We gave a diffusion model a feedback loop from its own internals, resulting in ~30% more viable candidate materials with target properties.

Our essay on intentional design describes our vision for using interpretability to guide model training – moving from guess-and-check to closed loop control.

Contact us
Interested in partnering with Goodfire?


