Sequence-Level Mixed Sample Data Augmentation
2020pp. 5547–5552
Citations Over TimeTop 10% of 2020 papers
Abstract
Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language. This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for sequence-to-sequence problems. Our approach, SeqMix, creates new synthetic examples by softly combining input/output sequences from the training set. We connect this approach to existing techniques such as SwitchOut SeqMix consistently yields approximately 1.0 BLEU improvement on five different translation datasets over strong Transformer baselines. On tasks that require strong compositional generalization such as SCAN and semantic parsing, Se-qMix also offers further improvements.
Related Papers
- → MOOC Dropout Prediction(2017)86 cited
- → A model for the identification of students at risk of dropout at a university of technology(2020)18 cited
- → Using Stacked Denoising Autoencoder for the Student Dropout Prediction(2017)13 cited
- → Flexible parsing(1980)31 cited
- → Combining SMT and NMT Back-Translated Data for Efficient NMT(2019)1 cited