Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Citations Over TimeTop 20% of 2022 papers
Abstract
Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.
Related Papers
- → The Kyoto Speech-to-Speech Translation System for IWSLT 2023(2023)2 cited
- Ogmios: The UPC Text-to-Speech synthesis system for Spoken Translation(2006)
- → English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor(2017)2 cited
- → Recurrent Stacking of Layers for Compact Neural Machine Translation Models(2018)2 cited
- → Statistical vowelization of Arabic text for speech synthesis in speech-to-speech translation systems(2007)1 cited