0 citations0 references

Hidden semi-Markov model based speech synthesis

2004pp. 1393–1396

Citations Over TimeTop 10% of 2004 papers

Heiga Zen, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura

Abstract

In the present paper, a hidden-semi Markov model (HSMM) based speech synthesis system is proposed. In a hidden Markov model (HMM) based speech synthesis system which we have proposed, rhythm and tempo are controlled by state duration probability distributions modeled by single Gaussian distributions. To synthesis speech, it constructs a sentence HMM corresponding to an arbitralily given text and determine state durations maximizing their probabilities, then a speech parameter vector sequence is generated for the given state sequence. However, there is an inconsistency: although the speech is synthesized from HMMs with explicit state duration probability distributions, HMMs are trained without them. In the present paper, we introduce an HSMM, which is an HMM with explicit state duration probability distributions, into the HMM-based speech synthesis system. Experimental results show that the use of HSMM training improves the naturalness of the synthesized speech.

Related Papers

→ A new Chinese text-to-speech system with high naturalness(2002)15 cited
→ Naturalness analysis of the speech synthesized by a TTS card(2016)2 cited
→ Czech Pitch Contour Modeling Using Linear Prediction(2008)1 cited
Removing Preglottalization from Unit-Selection Synthesis: Towards the Linguistic Naturalness of Synthetic Czech Speech(2012)
→ Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis(2019)