a public good project by the
Synthesis
Company
of California

© 2026

Carson Denison | doi.page

0 works0 citations0 h-index

Google Scholar OpenAlex

Carson Denison

Publications by Year

Research Areas

Topic Modeling, Ethics and Social Impacts of AI, Adversarial Robustness in Machine Learning, Explainable Artificial Intelligence (XAI), Natural Language Processing Techniques

Most-Cited Works

→ Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training(2024)31 cited
→ Measuring Faithfulness in Chain-of-Thought Reasoning(2023)21 cited
→ Alignment faking in large language models(2024)17 cited
→ Question Decomposition Improves the Faithfulness of Model-Generated Reasoning(2023)7 cited
→ Gradient-Based Language Model Red Teaming(2024)5 cited
→ Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models(2024)4 cited
→ Reasoning Models Don't Always Say What They Think(2025)4 cited
→ Many-shot Jailbreaking(2024)3 cited
→ Auditing language models for hidden objectives(2025)1 cited