0 citations0 references

Searching Better Architectures for Neural Machine Translation

IEEE/ACM Transactions on Audio Speech and Language Processing2020Vol. 28, pp. 1574–1585

Citations Over TimeTop 10% of 2020 papers

Fan Yang, Fei Tian, Yingce Xia, Tao Qin, Xiang‐Yang Li, Tie‐Yan Liu

Abstract

Neural architecture search (NAS) has played important roles in the evolution of neural architectures. However, no much attention has been paid to improve neural machine translation (NMT) through NAS approaches. In this work, we propose a gradient-based NAS algorithm for NMT, which automatically discovers architectures with better performances. Compared with previous NAS work, we jointly search the network operations (e.g., LSTM, CNN, self-attention etc) as well as dropout rates to ensure better results. We show that with reasonable resources it is possible to discover novel neural network architectures for NMT, which achieve consistently better performances than Transformer [1], the state-of-the-art NMT model, across different tasks. On WMT'14 English-to-German translation, IWSLT'14 German-to-English translation and WMT'18 Finnish-to-English translation tasks, our discovered architectures could obtain 30.1, 36.1 and 26.4 BLEU scores, which are great improvement over Transformer baselines. We also empirically verify that the discovered model on one task can be transferred to other tasks.

Citations Over TimeTop 10% of 2020 papers

Abstract

Related Papers