0 citations0 references

Statistical parametric speech synthesis using deep neural networks

2013pp. 7962–7966

Citations Over TimeTop 1% of 2013 papers

Abstract

Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.

Related Papers

→ Mid-long Term Load Forecasting Using Hidden Markov Model(2009)11 cited
→ On modeling context-dependent clustered states: Comparing HMM/GMM, hybrid HMM/ANN and KL-HMM approaches(2014)25 cited
→ A mixed autoregressive hidden-markov-chain model applied to people's movements(2012)12 cited
→ The Research of Software Behavior Recognition and Trend Prediction Method Based on GA-HMM(2015)
Research of HMM and I/O HMM used in protein secondary structure prediction(2002)