Discovering Language Model Behaviors with Model-Written Evaluations
Citations Over TimeTop 1% of 2023 papers
Abstract
Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Benjamin Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan. Findings of the Association for Computational Linguistics: ACL 2023. 2023.
Related Papers
- Коммуникaтивно- прaгмaтический aнaлиз дипломaтических бумaг (нa основе вербaльных нот)(2018)
- FEATURES AND DIFFERENCES OF ADEQUATE AND EQUIVALENT TRANSLATION(2018)
- चितलवाना पंचायत समिति में मानव गरीबी सूचकांक - 2016 ( à¤à¤• गà¥à¤°à¤¾à¤® सà¥à¤¤à¤°à¥€à¤¯ à¤à¥Œà¤—ोलिक अधà¥à¤¯à¤¯à¤¨ )(2017)
- → ФОРМИРОВAНИЕ ГОТОВНОСТИ БУДУЩИХ ПЕДAГОГОВ К ОРГAНИЗAЦИИ РAБОТЫ ПО РAЗВИТИЮ ВAЛЕОЛОГИЧЕСКОЙ КУЛЬТУРЫ ШКОЛЬНИКОВ(2023)