Simultaneous Speech-to-Speech Translation System with Transformer-Based Incremental ASR, MT, and TTS
2021Vol. 9, pp. 186–192
Citations Over Time
Ryo Fukuda, Sashi Novitasari, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Tomoya Yanagita, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
Abstract
In this paper, we present an English-to-Japanese simultaneous speech-to-speech translation (S2ST) system. It has three Transformer-based incremental processing modules for S2ST: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS). We also evaluated its system-level latency in addition to the module-level latency and accuracy.
Related Papers
- → The Kyoto Speech-to-Speech Translation System for IWSLT 2023(2023)2 cited
- Ogmios: The UPC Text-to-Speech synthesis system for Spoken Translation(2006)
- → Voice Signal Processing For Speech Synthesis(2006)17 cited
- → Digital speech processing : speech coding, synthesis, and recognition(1992)9 cited
- → Statistical vowelization of Arabic text for speech synthesis in speech-to-speech translation systems(2007)1 cited