David Lindner
Publications by Year
Research Areas
Reinforcement Learning in Robotics, Adversarial Robustness in Machine Learning, Advanced Bandit Algorithms Research, Explainable Artificial Intelligence (XAI), Multimodal Machine Learning Applications
Most-Cited Works
- → Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback(2023)89 cited
- → Red-Teaming the Stable Diffusion Safety Filter(2022)25 cited
- → Sensing social media signals for cryptocurrency news(2019)14 cited
- → Evaluating Frontier Models for Dangerous Capabilities(2024)9 cited
- → GoSafeOpt: Scalable safe exploration for global optimization of dynamical systems(2023)9 cited
- → Active Exploration for Inverse Reinforcement Learning(2022)6 cited
- → Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning(2023)6 cited
- → Tracr: Compiled Transformers as a Laboratory for Interpretability(2023)6 cited
- → On scalable oversight with weak LLMs judging strong LLMs(2024)6 cited
- → Learning Safety Constraints from Demonstrations with Unknown Rewards(2023)4 cited