Jeffrey Ladish
Publications by Year
Research Areas
Adversarial Robustness in Machine Learning, Topic Modeling, Multi-Agent Systems and Negotiation, Artificial Intelligence in Healthcare and Education, DNA and Biological Computing
Most-Cited Works
- → CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence(2022)295 cited
- → Measuring Progress on Scalable Oversight for Large Language Models(2022)31 cited
- → LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B(2023)11 cited
- → BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B(2023)1 cited
- → Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits(2024)