a public good project by the
Synthesis
Company
of California

© 2026

Ryan Greenblatt | doi.page

0 works0 citations0 h-index

Google Scholar OpenAlex

Ryan Greenblatt

Publications by Year

Research Areas

Topic Modeling, Natural Language Processing Techniques, Adversarial Robustness in Machine Learning, Teaching and Learning Programming, Biomedical and Engineering Education

Most-Cited Works

→ Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training(2024)31 cited
→ Alignment faking in large language models(2024)17 cited
→ AI Control: Improving Safety Despite Intentional Subversion(2023)3 cited
→ Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety(2025)2 cited
→ Preventing Language Models From Hiding Their Reasoning(2023)2 cited
→ Stress-Testing Capability Elicitation With Password-Locked Models(2024)
→ Bringing ROS to the Largest High School Robotics Competition(2018)
→ Believe It or Not: How Deeply do LLMs Believe Implanted Facts?(2025)