Research

Fundamental interpretability research to understand and intentionally design advanced AI systems

Filter By
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Fundamental Research

Understanding Memorization via Loss Curvature

Merullo et al.
·
November 6, 2025
Fundamental Research
Link post

Priors in Time: Missing Inductive Biases for Language Model Interpretability

Lubana et al.
·
November 3, 2025
Fundamental Research
Link post

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Bigelow et al.
·
November 1, 2025
Applied Research

Deploying Interpretability to Production with Rakuten: SAE Probes for PII Detection

Nguyen et al.
·
October 28, 2025
Fundamental Research
Link post

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

Gur-Arieh et al.
·
October 7, 2025
Fundamental Research
Link post

Understanding Sparse Autoencoder Scaling in the Presence of Feature Manifolds

Michaud et al.
·
September 4, 2025
Applied Research

Finding the Tree of Life in Evo 2

Pearce et al.
·
August 28, 2025
Fundamental Research
Link post

Adversarial Examples Are Not Bugs, They Are Superposition

Gorton & Lewis
·
August 26, 2025
Applied Research

Discovering Undesired Rare Behaviors via Model Diff Amplification

Aranguri & McGrath
·
August 21, 2025
Fundamental Research
Link post

The Circuits Research Landscape: Results and Perspectives

Lindsey et al.
·
August 5, 2025
Fundamental Research

Towards Scalable Parameter Decomposition

Bushnaq et al.
·
June 28, 2025
Fundamental Research

Replicating Circuit Tracing for a Simple Known Mechanism

Loeffler et al.
·
June 11, 2025
Applied Research

Painting With Concepts Using Diffusion Model Latents

Cammarata et al.
·
May 27, 2025
Fundamental Research

Under the Hood of a Reasoning Model

Hazra et al.
·
April 15, 2025
Applied Research

Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model

Gorton et al.
·
February 20, 2025
Fundamental Research
Link post

Open Problems in Mechanistic Interpretability

Sharkey et al.
·
January 27, 2025
Applied Research

Mapping the Latent Space of Llama 3.3 70B

McGrath et al.
·
December 23, 2024
Applied Research

Understanding and Steering Llama 3 with Sparse Autoencoders

McGrath et al.
·
September 25, 2024

Contact us

Interested in Goodfire Ember?