Goodfire AI

We've helped design AI for

Our Mission

Understand the scientific foundations of neural networks so that we can intentionally design AI

We believe that AI is the most consequential technology of our time, yet today we train models with remarkably little understanding of the nature of their intelligence.

We’re the research lab dedicated to creating the science and technology to change that.

The Intentional Design Agenda

Novel methods to understand,
‍debug, and design your AI model

Understand

Reverse engineer the causal mechanisms of AI to reveal its internal structure, uncovering novel science and validating when predictions reflect true understanding.

Discovering a novel class of Alzheimer's biomarkers

We identified a novel class of biomarkers for Alzheimer's detection by interpreting a epigenetic model, the first major finding in the natural sciences obtained from reverse-engineering a foundation model.

Interpreting Evo 2

We decoded the internal representations of Arc Institute's Evo 2 genomic model, finding features that map onto biological concepts from coding sequences to protein secondary structure. Published in Nature.

Explaining 4.2 million genetic variants

We used Evo 2 embeddings to predict whether and how genetic variants cause disease, achieving state-of-the-art performance and interpretable-by-design predictions.

Debug

Precisely debug issues with model behavior, identify and remove confounders, and diagnose failures before they occur in production.

Detecting performative chain-of-thought

We tracked “performative chain-of-thought”: when models “know” their final answer but continue to generate chain-of-thought anyways. We showed that probes can enable early exit from reasoning traces, saving up to 68% of tokens with minimal accuracy loss.

Validating whether a cardiac vision model learned real medicine

We analyzed the latent space of EchoJEPA, a vision model trained on cardiac echocardiography video, revealing which features encoded real clinical understanding of motion and anatomy.

Identifying bottlenecks to a robotics model's performance

We worked with a robotics team to identify information bottlenecks. By inspecting latent policy structure and representational geometry directly, we traced unstable behaviors to brittle internal features.

Design

Control training precisely to ensure your model learns what you want with less data and fewer off-target effects.

Reducing hallucinations with features as rewards

We cut hallucinations in an LLM by 58% by using interpretability to guide model training. Our approach was ~90x lower cost per intervention than LLM-as-judge, with no degradation in standard benchmarks.

Accelerating materials discovery with self-correcting search

We gave a diffusion model a feedback loop from its own internals, resulting in ~30% more viable candidate materials with target properties.

Intentionally designing the future of AI

Our essay on intentional design describes our vision for using interpretability to guide model training – moving from guess-and-check to closed loop control.

The platform for intentional model design

Silico lets you build AI models with the precision of written software. See what models have learned, find undesired behavior, and make targeted interventions to improve performance.

Learn more