Anton Lozhkov
Publications by Year
Research Areas
Natural Language Processing Techniques, Topic Modeling, Multimodal Machine Learning Applications, Speech Recognition and Synthesis, Advanced Neural Network Applications
Most-Cited Works
- → OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents(2023)48 cited
- → The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale(2024)19 cited
- → XTREME-S: Evaluating Cross-lingual Speech Representations(2022)14 cited
- → SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model(2025)6 cited
- → SmolVLM: Redefining small and efficient multimodal models(2025)2 cited
- → huggingface/datasets: 1.13.3(2021)