0 citations0 references

Hierarchical prosody modeling of English speech and its application to TTS

2014Vol. 20, pp. 1–6

Citations Over TimeTop 17% of 2014 papers

Chung-Yao Tsai, Chin-Kuan Kuo, Yih‐Ru Wang, Sin‐Horng Chen, I-Bin Liao, Chen-Yu Chiang

Abstract

In this paper, a hierarchical prosody modeling approach for English speech is proposed. It is an extended version of the HPM approach proposed previously for Mandarin speech. It first designs a syllable-based, statistical prosodic model to describe various relationships of prosodic-acoustic features of the speech signal, linguistic features of the associated text, and prosodic tags representing the underlining prosody structure of the speech. It then employs a prosody labeling and modeling algorithm to estimate the model parameters and label the prosodic tags of all training utterances simultaneously from a prosody-unlabeled speech corpus. Experimental results on a corpus containing many paragraphic utterances of a female English-majored Chinese speaker show that the inferred parameters of the model are all meaningful. We then use the trained model to generate prosodic information for a TTS system. An informal listening test shows that the synthetic speech sounds quite natural.

Related Papers

→ Joint prosody prediction and unit selection for concatenative speech synthesis(2002)58 cited
Prosody-based unit selection for Japanese speech synthesis.(1998)
→ Prosody generation by integrating rule and template-based approaches for emotional Malay speech synthesis(2008)3 cited
→ Evaluation of prosody in text-to-speech synthesis system of Bangla(2013)2 cited
Designing Target Cost Function Based on Prosody of Speech Database(Speech Synthesis and Prosody, Corpus-Based Speech Technologies)(2005)