Research

This page should redirect you to the arXiv paper.

Research

Understanding Memorization via Loss Curvature

November 6, 2025

Priors in Time: Missing Inductive Biases for Language Model Interpretability

November 3, 2025

Deploying Interpretability to Production with Rakuten: SAE Probes for PII Detection

October 28, 2025
Eric Bigelow
,
Daniel Wurgaft
,
YingQiao Wang
,
Noah Goodman
,
Tomer Ullman
,
Hidenori Tanaka
,
Ekdeep Singh Lubana
,
Fundamental Research
Link post