Loubna Ben Allal
Publications by Year
Research Areas
Topic Modeling, Natural Language Processing Techniques, Software Engineering Research, Point processes and geometric inequalities, Scheduling and Timetabling Solutions
Most-Cited Works
- → StarCoder: may the source be with you!(2023)192 cited
- → The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset(2023)65 cited
- → SantaCoder: don't reach for the stars!(2023)51 cited
- → The Stack: 3 TB of permissively licensed source code(2022)38 cited
- → The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale(2024)19 cited
- → SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model(2025)6 cited
- → Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations(2024)2 cited
- → SmolVLM: Redefining small and efficient multimodal models(2025)2 cited
- → The BigCode Project Governance Card(2023)1 cited