LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
2019pp. 1526–1530
Citations Over TimeTop 1% of 2019 papers
Abstract
This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use.It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems.The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than ideal for text-to-speech work.The released corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and the corresponding texts.Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers.
Related Papers
- → Where is Female Synthetic Speech?(1999)9 cited
- → A new Chinese text-to-speech system with high naturalness(2002)15 cited
- → Naturalness analysis of the speech synthesized by a TTS card(2016)2 cited
- Removing Preglottalization from Unit-Selection Synthesis: Towards the Linguistic Naturalness of Synthetic Czech Speech(2012)
- → Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis(2019)