0 works0 citations0 h-index

Ben Mann

Publications by Year

Research Areas

Topic Modeling, Natural Language Processing Techniques, Explainable Artificial Intelligence (XAI), Multimodal Machine Learning Applications, Software Engineering Research

Most-Cited Works

→ Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback(2022)360 cited
→ CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence(2022)295 cited
→ Predictability and Surprise in Large Generative Models(2022)171 cited
→ Language Models (Mostly) Know What They Know(2022)159 cited
→ Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned(2022)99 cited
→ In-context Learning and Induction Heads(2022)84 cited
→ The Capacity for Moral Self-Correction in Large Language Models(2023)48 cited
→ Discovering Language Model Behaviors with Model-Written Evaluations(2022)40 cited
→ Measuring Progress on Scalable Oversight for Large Language Models(2022)31 cited
→ A General Language Assistant as a Laboratory for Alignment(2021)27 cited