0 citations0 references

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

2021pp. 151–155

Citations Over TimeTop 10% of 2021 papers

Jia Ye, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu

Abstract

This paper introduces PnG BERT, a new encoder model for neural TTS.This model is augmented from the original BERT model, by taking both phoneme and grapheme representations of text as input, as well as the word-level alignment between them.It can be pre-trained on a large text corpus in a selfsupervised manner, and fine-tuned in a TTS task.Experimental results show that a neural TTS model using a pre-trained PnG BERT as its encoder yields more natural prosody and more accurate pronunciation than a baseline model using only phoneme input with no pre-training.Subjective side-by-side preference evaluations show that raters have no statistically significant preference between the speech synthesized using a PnG BERT and ground truth recordings from professional speakers.

Related Papers

→ Joint prosody prediction and unit selection for concatenative speech synthesis(2002)58 cited
Interference of First Language in Pronunciation of English Segmental Sounds(2015)
Prosody-based unit selection for Japanese speech synthesis.(1998)
→ On grapheme to phoneme conversion for Romanian using pronunciation by analogy(2012)1 cited
Designing Target Cost Function Based on Prosody of Speech Database(Speech Synthesis and Prosody, Corpus-Based Speech Technologies)(2005)