Ryan Greenblatt
Publications by Year
Research Areas
Topic Modeling, Natural Language Processing Techniques, Adversarial Robustness in Machine Learning, Teaching and Learning Programming, Biomedical and Engineering Education
Most-Cited Works
- → Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training(2024)31 cited
- → Alignment faking in large language models(2024)17 cited
- → AI Control: Improving Safety Despite Intentional Subversion(2023)3 cited
- → Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety(2025)2 cited
- → Preventing Language Models From Hiding Their Reasoning(2023)2 cited
- → Stress-Testing Capability Elicitation With Password-Locked Models(2024)
- → Bringing ROS to the Largest High School Robotics Competition(2018)
- → Believe It or Not: How Deeply do LLMs Believe Implanted Facts?(2025)