Adam Khoja
National Patient Safety Foundation(US)
Publications by Year
Research Areas
Adversarial Robustness in Machine Learning, Explainable Artificial Intelligence (XAI), Natural Language Processing Techniques, Ethics and Social Impacts of AI, Topic Modeling
Most-Cited Works
- → The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning(2024)13 cited
- → Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs(2025)5 cited
- → Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?(2024)4 cited
- → EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges(2025)1 cited
- → A Definition of AGI(2025)
- → Multi-Agent Inverse Q-Learning from Demonstrations(2025)
- → The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems(2025)
- → Governing Automated Strategic Intelligence(2025)