0 citations0 references

Muppet: Massive Multi-task Representations with Pre-Finetuning

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing2021pp. 5799–5811

Citations Over TimeTop 1% of 2021 papers

Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal Gupta

Abstract

We propose pre-finetuning, an additional largescale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that prefinetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Related Papers

Massively parallel artificial intelligence(1991)
→ Massively parallel switch-level simulation: a feasibility study(1991)26 cited
Massively parallel computing(1990)
→ An adaptation of the log-derivative method to massively parallel computers(1995)2 cited
→ Running Models on Massively Parallel Computers(1995)