The Synthesis Company of San Francisco Mountain Logo

Abstract

T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every NLP problem into a text-to-text format. It achieves state of the art results on a variety of NLP tasks (Summarization, Question-Answering, ...). Five sets of pre-trained weights (pre-trained on a multi-task mixture of unsupervised and supervised tasks) are released. In ascending order from 60 million parameters to 11 billion parameters: t5-small, t5-base, t5-large, t5-3b, t5-11b T5 can now be used with the translation and summarization pipeline. Related: paper official code model available in Hugging Face's community models docs Big thanks to the original authors, especially @craffel who helped answer our questions, reviewed PRs and tested T5 extensively. New BART checkpoint: bart-large-xsum (@sshleifer) These weights are from BART finetuned on the XSum abstractive summarization challenge, which encourages shorter (more abstractive) summaries. It achieves state of the art. BART summarization example with pytorch-lightning (@acarrera94) New example: BART for summarization, using Pytorch-lightning. Trains on CNN/DM and evaluates. Translation pipeline (@patrickvonplaten) A new pipeline is available, leveraging the T5 model. The T5 model was added to the summarization pipeline as well. Memory improvements with BART (@sshleifer) In an effort to have the same memory footprint and same computing power necessary to run inference on BART, several improvements have been made on the model: Remove the LM head and use the embedding matrix instead (~200MB) Call encoder before expanding input_ids (~1GB) SelfAttention only returns weights if config.output_attentions (~500MB) Two separate, smaller decoder attention masks (~500MB) drop columns that are exclusively pad_token_id from input_ids in evaluate_cnn example. New model: XLMForTokenClassification (@sakares) A new head was added to XLM: XLMForTokenClassification.

huggingface/transformers: T5 Model, BART summarization example and reduced memory, translation pipeline

Citations Over Time

Abstract