Statistical parametric speech synthesis using deep neural networks
Citations Over TimeTop 1% of 2013 papers
Abstract
Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.
Related Papers
- → Mid-long Term Load Forecasting Using Hidden Markov Model(2009)11 cited
- → On modeling context-dependent clustered states: Comparing HMM/GMM, hybrid HMM/ANN and KL-HMM approaches(2014)25 cited
- → A mixed autoregressive hidden-markov-chain model applied to people's movements(2012)12 cited
- → The Research of Software Behavior Recognition and Trend Prediction Method Based on GA-HMM(2015)
- Research of HMM and I/O HMM used in protein secondary structure prediction(2002)