Nathaniel Li
Meso Scale Discovery (United States)(US)National Patient Safety Foundation(US)
Publications by Year
Research Areas
Explainable Artificial Intelligence (XAI), Topic Modeling, Artificial Intelligence in Healthcare and Education, Domain Adaptation and Few-Shot Learning, Adversarial Robustness in Machine Learning
Most-Cited Works
- → Its Alive: AI Independence Without Human Prompting(2023)59 cited
- → Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark(2023)27 cited
- → HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal(2024)20 cited
- → Humanity's Last Exam(2025)14 cited
- → The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning(2024)13 cited
- → A benchmark of expert-level academic questions to assess AI capabilities(2026)2 cited
- → Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark(2025)1 cited
- → LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet(2024)1 cited
- → Just Read the Question: Enabling Generalization to New Assessment Items with Text Awareness(2025)
- → LLM Novice Uplift on Dual-Use, In Silico Biology Tasks(2026)