0 citations0 references

MLS: A Large-Scale Multilingual Dataset for Speech Research

2020pp. 2757–2761

Citations Over TimeTop 1% of 2020 papers

Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert

Abstract

This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages, including about 44.5K hours of English and a total of about 6K hours for other languages. Additionally, we provide Language Models (LM) and baseline Automatic Speech Recognition (ASR) models and for all the languages in our dataset. We believe such a large transcribed dataset will open new avenues in ASR and Text-To-Speech (TTS) research. The dataset will be made freely available for anyone at http://www.openslr.org.

Related Papers

→ Accounting for baseline targets in NDCs(2018)3 cited
Method for Improvement of Baseline Resolving Quality in GPS Measurement(2007)
Analysis of the Baseline Decorrelation and Critical Baseline of Interferometric SAR(2003)
Research of Baseline Implement(2010)
Study and application of telecom operators system security state baseline(2012)