Fazl Barez
University of Oxford(GB)Renmin University of China(CN)University of Edinburgh(GB)
Publications by Year
Research Areas
Topic Modeling, Explainable Artificial Intelligence (XAI), Natural Language Processing Techniques, Ethics and Social Impacts of AI, Adversarial Robustness in Machine Learning
Most-Cited Works
- → Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training(2024)31 cited
- → Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark(2023)17 cited
- → The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python(2023)11 cited
- → Exploring the Advantages of Transformers for High-Frequency Trading(2023)6 cited
- → Risks and Opportunities of Open-Source Generative AI(2024)6 cited
- → Benchmarking Specialized Databases for High-frequency Data(2023)5 cited
- → Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models(2024)4 cited
- → Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions(2024)3 cited
- → "Fairness in AI and Its Long-Term Implications on Society"(2023)3 cited
- → Near to Mid-term Risks and Opportunities of Open-Source Generative AI(2024)3 cited