Leandro von Werra
Publications by Year
Research Areas
Topic Modeling, Software Engineering Research, Natural Language Processing Techniques, Multimodal Machine Learning Applications, Machine Learning and Data Classification
Most-Cited Works
- → StarCoder: may the source be with you!(2023)192 cited
- → The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset(2023)65 cited
- → Zephyr: Direct Distillation of LM Alignment(2023)53 cited
- → SantaCoder: don't reach for the stars!(2023)51 cited
- → The Stack: 3 TB of permissively licensed source code(2022)38 cited
- → DESIGN AND PERFORMANCE OF TWO ORTHOGONAL EXTRACTION TIME-OF-FLIGHT SECONDARY ION MASS SPECTROMETERS FOR FOCUSED ION BEAM INSTRUMENTS(2014)35 cited
- → OctoPack: Instruction Tuning Code Large Language Models(2023)16 cited
- → Radiometric Characterization of a Water-Based Conical Blackbody Calibration Target for Millimeter-Wave Remote Sensing(2019)14 cited
- → Generative Adversarial Networks in Precision Oncology(2019)7 cited
- → SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model(2025)6 cited