Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends
IEEE Signal Processing Magazine2015Vol. 32(3), pp. 35–52
Citations Over TimeTop 1% of 2015 papers
Zhen-Hua Ling, Shiyin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiaojun Qian, Helen Meng, Li Deng
Abstract
Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear relationships between the speech generation inputs and the acoustic features. Inspired by the intrinsically hierarchical process of human speech production and by the successful application of deep neural networks (DNNs) to automatic speech recognition (ASR), deep learning techniques have also been applied successfully to speech generation, as reported in recent literature.
Related Papers
- → Identification of Objectionable Audio Segments Based on Pseudo and Heterogeneous Mixture Models(2012)7 cited
- → New technique to use the GMM in speaker recognition system (SRS)(2013)8 cited
- → Comparative Assessment of Pulsar Families using GMM and DPGMM(2019)1 cited
- → Mode merging for the finite mixture of t‐distributions(2021)1 cited
- → Weight Based Super-Gmm For Speaker Identification Systems(2008)1 cited