Works — Eric J. Michaud

2025

An exploration of how neural networks learn circuits, and how these circuits are expressed in the weights of the network, that are relevant to the problem of creating "narrow" AI systems.

Open Problems in Mechanistic Interpretability

Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Bloom, Stella Biderman, Adria Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saunders, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Tom McGrath TMLR

Physics of Skill Learning

Ziming Liu, Yixuan Liu, Eric J. Michaud, Josh Gore, Max Tegmark

2024

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Sam Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller ICLR 2025 (Oral)

Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code

Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, et al. Entropy

2023

The Space of LLM Learning Curves

Eric J. Michaud Blog

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell TMLR

The Quantization Model of Neural Scaling

Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark NeurIPS 2023

A theory of neural scaling, based on an assumption that neural computation decomposes into a variety of atomic parts called "quanta".

2022

Precision Machine Learning

Eric J. Michaud, Ziming Liu, Max Tegmark Entropy

Omnigrok: Grokking Beyond Algorithmic Data

Ziming Liu, Eric J. Michaud, Max Tegmark ICLR 2023 (Spotlight)

Towards Understanding Grokking: An Effective Theory of Representation Learning

Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams NeurIPS 2022 (Oral)

An Analysis of Grokking

Eric J. Michaud Blog

2020

Examining the Causal Structures of Deep Neural Networks Using Information Theory

Scythia Marrow, Eric J. Michaud, Erik Hoel Entropy

Understanding Learned Reward Functions

Eric J. Michaud, Adam Gleave, Stuart Russell NeurIPS 2020 Workshop

Lunar Opportunities for SETI

Eric J. Michaud, Andrew Siemion, Jamie Drew, Pete Worden

Older blog posts (2018-2020)

Mathematics and early machine learning notes Blog

Decomposing Deep Neural Network Minds into Parts

Understanding sparse autoencoder scaling in the presence of feature manifolds

On the creation of narrow AI: hierarchy and nonlocality of neural network skills

Open Problems in Mechanistic Interpretability

Physics of Skill Learning

Efficient Dictionary Learning with Switch Sparse Autoencoders

The Geometry of Concepts: Sparse Autoencoder Feature Structure

A Physics of Systems that Learn

Not all language model features are one-dimensionally linear

Survival of the Fittest Representation: A Case Study with Modular Addition

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code

The Space of LLM Learning Curves

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

The Quantization Model of Neural Scaling

Precision Machine Learning

Omnigrok: Grokking Beyond Algorithmic Data

Towards Understanding Grokking: An Effective Theory of Representation Learning

An Analysis of Grokking

Examining the Causal Structures of Deep Neural Networks Using Information Theory

Understanding Learned Reward Functions

Lunar Opportunities for SETI

Older blog posts (2018-2020)