Tom Lieberum
Google (United States)(US)DeepMind (United Kingdom)(GB)
Publications by Year
Research Areas
Natural Language Processing Techniques, Topic Modeling, Generative Adversarial Networks and Image Synthesis, Machine Learning and Data Classification, Speech Recognition and Synthesis
Most-Cited Works
- → Progress measures for grokking via mechanistic interpretability(2023)54 cited
- → Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2(2024)26 cited
- → Evaluating Frontier Models for Dangerous Capabilities(2024)9 cited
- → Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla(2023)6 cited
- → Improving Dictionary Learning with Gated Sparse Autoencoders(2024)2 cited
- → Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders(2024)2 cited
- → Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders(2024)
- → AtP*: An efficient and scalable method for localizing LLM behaviour to components(2024)