a public good project by the
Synthesis
Company
of California

© 2026

Matthew Rahtz | doi.page

0 works0 citations0 h-index

Google Scholar OpenAlex

Matthew Rahtz

Publications by Year

Research Areas

Explainable Artificial Intelligence (XAI), Topic Modeling, Natural Language Processing Techniques, Reinforcement Learning in Robotics, Ethics and Social Impacts of AI

Most-Cited Works

→ Ensembl 2016(2015)1,352 cited
→ Evaluating Frontier Models for Dangerous Capabilities(2024)9 cited
→ A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI(2024)7 cited
→ Tracr: Compiled Transformers as a Laboratory for Interpretability(2023)6 cited
→ Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla(2023)6 cited
→ Safe Deep RL in 3D Environments using Human Feedback(2022)2 cited
→ The Hydra Effect: Emergent Self-repair in Language Model Computations(2023)2 cited
→ An Extensible Interactive Interface for Agent Design(2019)1 cited
→ Truth in the 'killer robots' angle?(2017)