0 works0 citations0 h-index

Nathaniel Li

Meso Scale Discovery (United States)(US)National Patient Safety Foundation(US)

Publications by Year

Research Areas

Explainable Artificial Intelligence (XAI), Topic Modeling, Artificial Intelligence in Healthcare and Education, Domain Adaptation and Few-Shot Learning, Adversarial Robustness in Machine Learning

Most-Cited Works

→ Its Alive: AI Independence Without Human Prompting(2023)59 cited
→ Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark(2023)27 cited
→ HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal(2024)20 cited
→ Humanity's Last Exam(2025)14 cited
→ The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning(2024)13 cited
→ A benchmark of expert-level academic questions to assess AI capabilities(2026)2 cited
→ Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark(2025)1 cited
→ LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet(2024)1 cited
→ Just Read the Question: Enabling Generalization to New Assessment Items with Text Awareness(2025)
→ LLM Novice Uplift on Dual-Use, In Silico Biology Tasks(2026)