GlottDNN — A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis
Citations Over TimeTop 10% of 2016 papers
Abstract
GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.
Related Papers
- → Normalizing the vocal tract length for speaker independent speech recognition(1995)6 cited
- → Model‐based speaker normalization methods for speech recognition(2003)1 cited
- → Robust feature vector compression algorithm for distributed speech recognition(1999)7 cited
- → Vocal-tract acoustics and speech synthesis(2002)
- → Korean articulatory speech synthesis using physical vocal tract model.(2009)