Noisy speech database for training speech enhancement algorithms and TTS models
Citations Over Time
Abstract
Clean and noisy parallel speech database. The database was designed to train and test speech enhancement methods that operate at 48kHz. A more detailed description can be found in the papers associated with the database. For the 28 speaker dataset, details can be found in: C. Valentini-Botinhao, X. Wang, S. Takaki & J. Yamagishi, "Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks", In Proc. Interspeech 2016. For the 56 speaker dataset: C. Valentini-Botinhao, X. Wang, S. Takaki & J. Yamagishi, "Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech”, In Proc. SSW 2016. Some of the noises used to create the noisy speech were obtained from the Demand database, available here: http://parole.loria.fr/DEMAND/ . The speech database was obtained from the CSTR VCTK Corpus, available here: https://doi.org/10.7488/ds/1994. The speech-shaped and babble noise files that were used to create this dataset are available here: http://homepages.inf.ed.ac.uk/cvbotinh/se/noises/.