End-to-End Speech Translation with the Transformer
Citations Over TimeTop 10% of 2018 papers
Abstract
Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Recognition and Machine Translation. This approach has the main drawback that errors are concatenated. Recently, neural approaches to Speech Recognition and Machine Translation have made possible facing the task by means of an End-to-End Speech Translation architecture. In this paper, we propose to use the architecture of the Transformer which is based solely on attention-based mechanisms to address the End-to-End Speech Translation system. As a contrastive architecture, we use the same Transformer to built the Speech Recognition and Machine Translation systems to perform Speech Translation through concatenation of systems. Results on a Spanish-to-English standard task show that the end-to-end architecture is able to outperform the concatenated systems by half point BLEU.
Related Papers
- → Training Deeper Neural Machine Translation Models with Transparent Attention(2018)144 cited
- → Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder(2019)65 cited
- → Searching Better Architectures for Neural Machine Translation(2020)27 cited
- → Incorporating Pre-trained Model into Neural Machine Translation(2021)2 cited