Searching Better Architectures for Neural Machine Translation
Citations Over TimeTop 10% of 2020 papers
Abstract
Neural architecture search (NAS) has played important roles in the evolution of neural architectures. However, no much attention has been paid to improve neural machine translation (NMT) through NAS approaches. In this work, we propose a gradient-based NAS algorithm for NMT, which automatically discovers architectures with better performances. Compared with previous NAS work, we jointly search the network operations (e.g., LSTM, CNN, self-attention etc) as well as dropout rates to ensure better results. We show that with reasonable resources it is possible to discover novel neural network architectures for NMT, which achieve consistently better performances than Transformer [1], the state-of-the-art NMT model, across different tasks. On WMT'14 English-to-German translation, IWSLT'14 German-to-English translation and WMT'18 Finnish-to-English translation tasks, our discovered architectures could obtain 30.1, 36.1 and 26.4 BLEU scores, which are great improvement over Transformer baselines. We also empirically verify that the discovered model on one task can be transferred to other tasks.
Related Papers
- → Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder(2019)65 cited
- → Neural Machine Translation with the Transformer and Multi-Source Romance Languages for the Biomedical WMT 2018 task(2018)15 cited
- → Searching Better Architectures for Neural Machine Translation(2020)27 cited
- → Incorporating Pre-trained Model into Neural Machine Translation(2021)2 cited
- → On The Alignment Problem In Multi-Head Attention-Based Neural Machine\n Translation(2018)