Blog

Announcing Goodfire's Fellowship Program for Interpretability Research

Published

October 9, 2025

We're excited to announce that we'll be bringing on several Research Fellows and Research Engineering Fellows this fall for our fellowship program. Fellows will collaborate with senior members of our technical staff, contribute to core projects, and work full time in person in our San Francisco office. For exceptional candidates, there will be an opportunity to convert to full time research positions.

Why we're launching this program

We're launching the fellowship to accelerate interpretability research, which we believe is essential to building aligned, powerful AI models. The fellowship is designed for early- to mid-career researchers and engineers who are interested in the field; we're particularly excited about great engineers transitioning into interpretability research engineering.

We're focused on a number of research directions — e.g., scientific discovery via interpretability on scientific models, training interpretable models, and new interpreter methods — and the program will bring on a few talented researchers to push forward each direction.

What fellows should expect

Every fellow is expected to hit the ground running. The fellowship will be intensive, you'll be expected to learn new methods rapidly, and you will make real contributions to our research. By the end of the 3 months, every fellow will produce a tangible output. This might be a co-authored research paper, a product, or a piece of infrastructure.

By the start of the fellowship, all fellows will be matched with a senior researcher at Goodfire who will be their research collaborator.

Examples of our research directions

Representational structure of generalization/memorization - Jack Merullo

e.g. Could we tell if gpt-oss was memorizing its training data?, Talking Heads

Interpretability for scientific discovery - Dan Balsam, Michael Pearce, Nick Wang

e.g. Finding the Tree of Life in Evo 2; see Goodfire Announces Collaboration to Advance Genomic Medicine with AI Interpretability

Causal analysis - Atticus Geiger

e.g. Language Models use Lookbacks to Track Beliefs; see How Causal Abstraction Underpins Computational Explanation

Dynamics of representations - Ekdeep Singh Lubana

e.g. ICLR: In-Context Learning of Representations, In-context learning strategies emerge rationally

Other directions - Tom McGrath, Owen Lewis

Fellows will receive:

Competitive compensation aligned with experience and qualifications
Full coverage of necessary compute and API costs
Direct mentorship from a Member of Technical Staff
Opportunity to co-author published research in some cases

Who we're looking for

We are looking for talented early- to mid-career researchers or engineers with a strong background in ML who can independently execute an interpretability research project, working alongside senior researchers. While a background in interpretability is not necessary, you should demonstrate deep experience in an adjacent field and the ability to learn to use new methods quickly (and we are very excited about candidates that do have interpretability backgrounds). All fellows will need to demonstrate high ownership, agency, and creativity within their area of research.

We're excited about applicants with a range of skillsets, for example:

Large-scale reinforcement learning
Bayesian and causal inference
Signal processing
Model training and optimization
Model inference optimization
Distributed systems and parallel compute
Developer tooling and infra
Large-scale API infrastructure

Responsibilities:

Execute an interpretability research project.
Produce a co-authored research paper, a product, or a piece of infrastructure.
Implement feedback from mentors while maintaining independent execution capability.
Commit full-time, in-person hours.

How to apply

Applications to join our fellowship are now open.

You can apply to the Research Fellowship here, and the Research Engineering Fellowship here.

Please apply by October 31st, 11:59pm PT to be considered for fall start dates. Applications will be reviewed on a rolling basis.

Research

Features as Rewards: Using Interpretability to Reduce Hallucinations

February 11, 2026

Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers

January 28, 2026

Understanding Memorization via Loss Curvature

November 6, 2025