Recurrent Stacking of Layers for Compact Neural Machine Translation Models
Citations Over Time
Abstract
In neural machine translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder. As a result, the addition of each new layer improves the translation quality significantly. However, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all the layers thereby leading to a recurrently stacked NMT model. We empirically show that the translation quality of a model that recurrently stacks a single layer 6 times is comparable to the translation quality of a model that stacks 6 separate layers. We also show that using pseudo-parallel corpora by back-translation leads to further significant improvements in translation quality.
Related Papers
- → Modulating the Interlayer Stacking of Covalent Organic Frameworks for Efficient Acetylene Separation(2023)76 cited
- → Interplay of π-stacking and inter-stacking interactions in two-component crystals of neutral closed-shell aromatic compounds: periodic DFT study(2020)40 cited
- → Stacking variants for doubly-connected systems arranged according to the percentages of hexagonal stacking(1981)13 cited
- → Survey of possible layer stacking structures*(1967)25 cited
- → Converting SMILES to Stacking Interaction Energies(2019)1 cited