Eric J. Michaud's homepage

Hi there! I am a 4th year PhD student in the Department of Physics at MIT and, as of June 2025, a resident at Goodfire AI. Previously, I was a math major at UC Berkeley. Here is my GitHub and a CV. I am also on Twitter @ericjmichaud_.

My Research

My current research focuses on improving our scientific/theoretical understanding of deep learning—understanding what deep neural networks do internally and why they work so well. This is part of a broader interest in the nature of intelligent systems, which previously led me to work with SETI astronomers, with Stuart Russell's AI alignment group (CHAI), and with Erik Hoel on a project related to integrated information theory. I am supervised by Max Tegmark and was supported by the NSF Graduate Research Fellowship Program.

Papers

Here is my Google Scholar page

On the creation of narrow AI: hierarchy and nonlocality of neural network skills. Eric J. Michaud, Asher Parker-Sartori, Max Tegmark. 2025.
Open Problems in Mechanistic Interpretability. Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, ..., Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saunders, David Bau, Eric Todd, ..., Jesse Hoogland, Daniel Murfet, Tom McGrath. 2025.
The Geometry of Concepts: Sparse Autoencoder Feature Structure. Yuxiao Li, Eric J. Michaud, David D. Baek, Joshua Engels, Xiaoqing Sun, Max Tegmark. 2024.
Efficient Dictionary Learning with Switch Sparse Autoencoders. Anish Mudide, Joshua Engels, Eric J. Michaud, Max Tegmark, Christian Schroeder de Witt. 2024.
Survival of the Fittest Representation: A Case Study with Modular Addition. Xiaoman Delores Ding, Zifan Carl Guo, Eric J. Michaud, Ziming Liu, Max Tegmark. 2024.
Not All Language Model Features Are Linear. Joshua Engels, Isaac Liao, Eric J. Michaud, Wes Gurnee, Max Tegmark. 2023. thread.
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models. Sam Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller. 2024. Feature circuit visualization website.
Opening the AI black box: program synthesis via mechanistic interpretability. Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelic, Max Tegmark. 2024.
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, ..., Max Nadeau, Eric J. Michaud, Jacob Pfau, ..., Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell. 2023.
The Quantization Model of Neural Scaling. Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark. NeurIPS, 2023.
Precision Machine Learning. Eric J. Michaud, Ziming Liu, Max Tegmark. Entropy, 25(1), 175, 2023.
Omnigrok: Grokking Beyond Algorithmic Data. Ziming Liu, Eric J. Michaud, Max Tegmark. ICLR (spotlight), 2023.
Towards Understanding Grokking: An Effective Theory of Representation Learning . Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams. NeurIPS (oral), 2022. Blog post.
Examining the Causal Structures of Deep Neural Networks Using Information Theory. Scythia Marrow, Eric J. Michaud, Erik Hoel. Entropy, 22(12):1429, 2020. Code. Videos.
Understanding Learned Reward Functions. Eric J. Michaud, Adam Gleave, Stuart Russell. Deep RL Workshop, NeurIPS 2020. Code.
Lunar Opportunities for SETI.
Eric J. Michaud, Andrew Siemion, Jamie Drew, Pete Worden, 2020.

Selected Talks

The Quantization Model of Neural Scaling, 5th Workshop on Neural Scaling Laws: Emergence and Phase Transitions, July 2023.
The Quantization Model of Neural Scaling, MIT Department of Physics "The Impact of chatGPT and other large language models on physics research and education" workshop, July 2023.
Omnigrok: Grokking Beyond Algorithmic Data, ICLR in Kigali, Rwanda, May 2023

Pre-PhD Life

During undergrad at Berkeley, I was fortunate to work with some really lovely people on different projects.

In the summer of 2020, I worked with Adam Gleave at CHAI on a paper exploring the use of interpretability techniques on learned reward functions. We presented the paper at the Deep RL Workshop at NeurIPS 2020.

In 2020, I also worked with the neuroscientist Erik Hoel. Our paper measuring effective information and integrated information in deep neural networks was published in the journal Entropy. The code is available here.

Previously, I worked with the Berkeley SETI Research Center (the Breakthrough Listen Initiative), and wrote a paper on the idea of doing radio-frequency SETI searches from the far side of the Moon. More info on the project, with some more links, can be found here. This work was the subject of a lovely article on supercluster.com.