Dylan Hadfield-Menell
Publications by Year
Research Areas
Reinforcement Learning in Robotics, Adversarial Robustness in Machine Learning, Topic Modeling, Auction Theory and Applications, Advanced Bandit Algorithms Research
Most-Cited Works
- → Cooperative Inverse Reinforcement Learning(2016)322 cited
- → Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks(2023)96 cited
- → Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback(2023)89 cited
- → Building Human Values into Recommender Systems: An Interdisciplinary Synthesis(2023)77 cited
- → Guided search for task and motion plans using learned heuristics(2016)67 cited
- → Inverse Reward Design(2017)63 cited
- → Incomplete Contracting and AI Alignment(2019)47 cited
- → Black-Box Access is Insufficient for Rigorous AI Audits(2024)45 cited
- → On the Geometry of Adversarial Examples(2018)43 cited