Carson Denison
Publications by Year
Research Areas
Topic Modeling, Ethics and Social Impacts of AI, Adversarial Robustness in Machine Learning, Explainable Artificial Intelligence (XAI), Natural Language Processing Techniques
Most-Cited Works
- → Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training(2024)31 cited
- → Measuring Faithfulness in Chain-of-Thought Reasoning(2023)21 cited
- → Alignment faking in large language models(2024)17 cited
- → Question Decomposition Improves the Faithfulness of Model-Generated Reasoning(2023)7 cited
- → Gradient-Based Language Model Red Teaming(2024)5 cited
- → Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models(2024)4 cited
- → Reasoning Models Don't Always Say What They Think(2025)4 cited
- → Many-shot Jailbreaking(2024)3 cited
- → Auditing language models for hidden objectives(2025)1 cited