Kshitij Sachan
Publications by Year
Research Areas
Adversarial Robustness in Machine Learning, Topic Modeling, Neural Networks and Applications, Ethics and Social Impacts of AI, Model Reduction and Neural Networks
Most-Cited Works
- → Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training(2024)31 cited
- → Debating with More Persuasive LLMs Leads to More Truthful Answers(2024)9 cited
- → Polysemanticity and Capacity in Neural Networks(2022)4 cited
- → AI Control: Improving Safety Despite Intentional Subversion(2023)3 cited