Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University
Citations Over TimeTop 12% of 2020 papers
Abstract
AppTek and RWTH Aachen University team together to participate in the offline and simultaneous speech translation tracks of IWSLT 2020. For the offline task, we create both cascaded and end-to-end speech translation systems, paying attention to careful data selection and weighting. In the cascaded approach, we combine high-quality hybrid automatic speech recognition (ASR) with the Transformer-based neural machine translation (NMT). Our endto-end direct speech translation systems benefit from pretraining of adapted encoder and decoder components, as well as synthetic data and fine-tuning and thus are able to compete with cascaded systems in terms of MT quality. For simultaneous translation, we utilize a novel architecture that makes dynamic decisions, learned from parallel data, to determine when to continue feeding on input or generate output words. Experiments with speech and text input show that even at low latency this architecture leads to superior translation results.
Related Papers
- → Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University(2020)17 cited
- → Edinburgh’s End-to-End Multilingual Speech Translation System for IWSLT 2021(2021)3 cited
- → Simultaneous Speech-to-Speech Translation System with Transformer-Based Incremental ASR, MT, and TTS(2021)1 cited
- → Low Latency End-to-End Streaming Speech Recognition with a Scout Network(2020)22 cited
- → Speaker recognition application in automatic speech-to-speech translation(2014)