Chris Olah
Publications by Year
Research Areas
Topic Modeling, Natural Language Processing Techniques, Adversarial Robustness in Machine Learning, Neural Networks and Applications, Explainable Artificial Intelligence (XAI)
Most-Cited Works
- → TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems(2016)9,726 cited
- → Deconvolution and Checkerboard Artifacts(2016)1,638 cited
- → Feature Visualization(2017)803 cited
- → The Building Blocks of Interpretability(2018)603 cited
- → Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback(2022)360 cited
- → Zoom In: An Introduction to Circuits(2020)242 cited
- → Multimodal Neurons in Artificial Neural Networks(2021)213 cited
- → Language Models (Mostly) Know What They Know(2022)159 cited
- → Activation Atlas(2019)135 cited
- → Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned(2022)99 cited