Research

Fundamental interpretability research to understand and intentionally design advanced AI systems

Filter By

Applied Research

Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train

Bergen et al.

June 11, 2026

Applied Research

Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train

Bergen et al.

June 11, 2026

Link post

Applied Research

Logits as a new monitor for evaluation awareness

Santiago Aranguri

June 4, 2026

Link post

Applied Research

Logits as a new monitor for evaluation awareness

Santiago Aranguri

June 4, 2026

Fundamental Research

Link post

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Huang et al.

June 1, 2026

Fundamental Research

Link post

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Huang et al.

June 1, 2026

Fundamental Research

Can SAEs Capture Neural Geometry?

Bhalla et al.

May 21, 2026

Fundamental Research

Can SAEs Capture Neural Geometry?

Bhalla et al.

May 21, 2026

Fundamental Research

A Geometric Calculator Inside a Neural Network

Feucht et al.

May 14, 2026

Fundamental Research

A Geometric Calculator Inside a Neural Network

Feucht et al.

May 14, 2026

Link post

Predicting Rare LLM Failures with 30× Fewer Rollouts

Aranguri & Pernice

May 13, 2026

Link post

Predicting Rare LLM Failures with 30× Fewer Rollouts

Aranguri & Pernice

May 13, 2026

Fundamental Research

Steering Along Manifolds to Control Neural Networks

Wurgaft et al.

May 7, 2026

Fundamental Research

Steering Along Manifolds to Control Neural Networks

Wurgaft et al.

May 7, 2026

Fundamental Research

The World Inside Neural Networks

Geiger et al.

May 7, 2026

Fundamental Research

The World Inside Neural Networks

Geiger et al.

May 7, 2026

Fundamental Research

Paper Summary: Interpreting Language Model Parameters

Bushnaq et al.

May 5, 2026

Fundamental Research

Paper Summary: Interpreting Language Model Parameters

Bushnaq et al.

May 5, 2026

Fundamental Research

Interpreting Language Model Parameters

Bushnaq et al.

May 5, 2026

Fundamental Research

Interpreting Language Model Parameters

Bushnaq et al.

May 5, 2026

Applied Research

Verbalized Eval Awareness Inflates Measured Safety

Aranguri and Bloom

May 4, 2026

Applied Research

Verbalized Eval Awareness Inflates Measured Safety

Aranguri and Bloom

May 4, 2026

Applied Research

Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training

Xiao and Aranguri

April 29, 2026

Applied Research

Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training

Xiao and Aranguri

April 29, 2026

Applied Research

Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions

Pearce et al.

April 14, 2026

Applied Research

Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions

Pearce et al.

April 14, 2026

Fundamental Research

Covariance-based Sequence Pooling

Dooms et al.

April 10, 2026

Fundamental Research

Covariance-based Sequence Pooling

Dooms et al.

April 10, 2026

Applied Research

Using Self-Correcting Search to Accelerate Materials Discovery

Hazra et al.

April 1, 2026

Applied Research

Using Self-Correcting Search to Accelerate Materials Discovery

Hazra et al.

April 1, 2026

Applied Research

Reasoning Theater: Probing for Performative Chain-of-Thought

Boppana et al.

March 12, 2026

Applied Research

Reasoning Theater: Probing for Performative Chain-of-Thought

Boppana et al.

March 12, 2026

Fundamental Research

Features as Rewards: Using Interpretability to Reduce Hallucinations

Prasad et al.

February 11, 2026

Fundamental Research

Features as Rewards: Using Interpretability to Reduce Hallucinations

Prasad et al.

February 11, 2026

Applied Research

Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers

Wang et al.

January 28, 2026

Applied Research

Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers

Wang et al.

January 28, 2026

Fundamental Research

Understanding Memorization via Loss Curvature

Merullo et al.

November 6, 2025

Fundamental Research

Understanding Memorization via Loss Curvature

Merullo et al.

November 6, 2025

Fundamental Research

Link post

Priors in Time: Missing Inductive Biases for Language Model Interpretability

Lubana et al.

November 3, 2025

Fundamental Research

Link post

Priors in Time: Missing Inductive Biases for Language Model Interpretability

Lubana et al.

November 3, 2025

Fundamental Research

Link post

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Bigelow et al.

November 1, 2025

Fundamental Research

Link post

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Bigelow et al.

November 1, 2025

Applied Research

Deploying Interpretability to Production with Rakuten: SAE Probes for PII Detection

Nguyen et al.

October 28, 2025

Applied Research

Deploying Interpretability to Production with Rakuten: SAE Probes for PII Detection

Nguyen et al.

October 28, 2025

Fundamental Research

Link post

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

Gur-Arieh et al.

October 7, 2025

Fundamental Research

Link post

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

Gur-Arieh et al.

October 7, 2025

Fundamental Research

Link post

Understanding Sparse Autoencoder Scaling in the Presence of Feature Manifolds

Michaud et al.

September 4, 2025

Fundamental Research

Link post

Understanding Sparse Autoencoder Scaling in the Presence of Feature Manifolds

Michaud et al.

September 4, 2025

Applied Research

Finding the Tree of Life in Evo 2

Pearce et al.

August 28, 2025

Applied Research

Finding the Tree of Life in Evo 2

Pearce et al.

August 28, 2025

Fundamental Research

Link post

Adversarial Examples Are Not Bugs, They Are Superposition

Gorton & Lewis

August 26, 2025

Fundamental Research

Link post

Adversarial Examples Are Not Bugs, They Are Superposition

Gorton & Lewis

August 26, 2025

Applied Research

Discovering Undesired Rare Behaviors via Model Diff Amplification

Aranguri & McGrath

August 21, 2025

Applied Research

Discovering Undesired Rare Behaviors via Model Diff Amplification

Aranguri & McGrath

August 21, 2025

Fundamental Research

Link post

The Circuits Research Landscape: Results and Perspectives

Lindsey et al.

August 5, 2025

Fundamental Research

Link post

The Circuits Research Landscape: Results and Perspectives

Lindsey et al.

August 5, 2025

Fundamental Research

Towards Scalable Parameter Decomposition

Bushnaq et al.

June 28, 2025

Fundamental Research

Towards Scalable Parameter Decomposition

Bushnaq et al.

June 28, 2025

Fundamental Research

Replicating Circuit Tracing for a Simple Known Mechanism

Loeffler et al.

June 11, 2025

Fundamental Research

Replicating Circuit Tracing for a Simple Known Mechanism

Loeffler et al.

June 11, 2025

Applied Research

Painting With Concepts Using Diffusion Model Latents

Cammarata et al.

May 27, 2025

Applied Research

Painting With Concepts Using Diffusion Model Latents

Cammarata et al.

May 27, 2025

Fundamental Research

Under the Hood of a Reasoning Model

Hazra et al.

April 15, 2025

Fundamental Research

Under the Hood of a Reasoning Model

Hazra et al.

April 15, 2025

Applied Research

Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model

Gorton et al.

February 20, 2025

Applied Research

Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model

Gorton et al.

February 20, 2025

Fundamental Research

Link post

Open Problems in Mechanistic Interpretability

Sharkey et al.

January 27, 2025

Fundamental Research

Link post

Open Problems in Mechanistic Interpretability

Sharkey et al.

January 27, 2025

Applied Research

Mapping the Latent Space of Llama 3.3 70B

McGrath et al.

December 23, 2024

Applied Research

Mapping the Latent Space of Llama 3.3 70B

McGrath et al.

December 23, 2024

Applied Research

Understanding and Steering Llama 3 with Sparse Autoencoders

McGrath et al.

September 25, 2024

Applied Research

Understanding and Steering Llama 3 with Sparse Autoencoders

McGrath et al.

September 25, 2024

Contact us

Interested in partnering with Goodfire?

Get in touch