huggingface/transformers: Trainer, TFTrainer, Multilingual BART, Encoder-decoder improvements, Generation Pipeline
Abstract
Trainer & TFTrainer Version 2.9 introduces a new Trainer class for PyTorch, and its equivalent TFTrainer for TF 2. This let us reorganize the example scripts completely for a cleaner codebase. The main features of the Trainer are: Same user-facing API for PyTorch and TF 2 Support for CPU, GPU, Multi-GPU, and TPU Easier than ever to share your fine-tuned models The TFTrainer was largely contributed by awesome community member @jplu! 🔥 🔥 A few additional features of the example scripts are: Generate argparsers from type hints on dataclasses Can load arguments from json files Logging through TensorBoard and wandb Documentation for the Trainer is still work-in-progress, please consider contributing improvements. TPU Support Both the TensorFlow and PyTorch trainers have TPU support (@jplu, @LysandreJik, @julien-c). An additional utility is added so that the TPU scripts may be launched in a similar manner to torch.distributed. This was built with the support of @jysohn23, member of the Google TPU team Multilingual BART (@sshleifer) New BART checkpoint converted: this adds mbart-en-ro model, a BART variant finetuned on english-romanian translation. Improved support for huggingface/tokenizers Additional tests and support has been added to huggingface/tokenizers tokenizers. (@mfuntowicz, @thomwolf) TensorFlow models work out-of-the-box with the new tokenizers (@LysandreJik) Decoder caching for T5 (@patrickvonplaten) Auto-regressive decoding for T5 has been greatly sped up by storing past key/value states. Work done on both PyTorch and TensorFlow. Breaking change This introduces a breaking change, in that it increases the default output length of T5Model and T5ForConditionalGeneration from 4 to 5 (including the past_key_value_states). Encoder-Decoder enhancements Apply Encoder Decoder 1.5GB memory savings to TF as well (@patrickvonplaten, translation of same work on PyTorch models by @sshleifer) BART Summarization fine-tuning script now works for T5 as well (@sshleifer) Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (@patrickvonplaten) Additional model architectures Question Answering support for Albert and Roberta in TF with (@Pierrci): Question Answering support for Albert and Roberta in TF TFAlbertForQuestionAnswering Pipelines The question answering pipeline now handles impossible answers (@bryant1410) Remove tqdm logging (@mfuntowicz) Sentiment analysis pipeline can now handle more than two sequences (@xxbidiao) Rewritten batch support in pipelines (@mfuntowicz) Text Generation pipeline (@enzoampil) Implements a text generation pipeline, GenerationPipeline, which works on any ModelWithLMHead head. Fixes and improvements Clean the generate testing functions (@patrickvonplaten) Notebooks updated in the documentation (@LysandreJik) Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (@ethanjperez) Fixed RoBERTa conversion script (@myleott) Speedup torch summarization tests (@sshleifer) Optimize causal mask using torch.where (@Akababa) Improved benchmarking utils (@patrickvonplaten) Fixed edge case for bert tokenization (@patrickvonplaten) SummarizationDataset cleanup (@sshleifer) BART: Replace config.output_past with use_cache kwarg (@sshleifer) Better documentation for Summarization and Translation pipeline (@julien-c) Additional documentation for model cards (@julien-c) Fix force_download of files on Windows (@calpt) Fix shuffling issue for distributed training (@elk-cloner) Shift labels internally within TransfoXLLMHeadModel when called with labels (@TevenLeScao) Remove output_past everywhere and replace by use_cache argument (@patrickvonplaten) Added unit test for run_bart_sum (@sshleifer) Cleaner code by factorizating a few methods back in the PreTrainedModel (@sshleifer) [Bert] remove hard-coded pad token id (@patrickvonplaten) Clean pipelines test and remove unnecessary code (@patrickvonplaten) JITting is not compatible with PyTorch/XLA or any other frameworks that requires serialization. The JITted methods were removed (@LysandreJik) Change newstest2013 to newstest2014 and clean up (@patrickvonplaten) Factor out tensor conversion method in PretrainedTokenizer (@sshleifer) Remove tanh torch warnings (@aryanshomray) Fix token_type_id in BERT question-answering example (@siboehm) Add CircleCI workflow to build docs for preview (@harupy) Higher tolerance for past testing in T5 and TF T5 (@patrickvonplaten) XLM tokenizer should encode with bos token (@LysandreJik) XLM tokenizer should encode with bos token (@patrickvonplaten) fix summarization do_predict (@sshleifer) Encode to max length of input not max length of tokenizer for batch input (@patrickvonplaten) Add qas_id to SquadResult and SquadExample (@jarednielsen) Fix bug in run_*.py scripts: double wrap into DataParallel during eval (@and-kul) Fix torchhub integration (@julien-c) Fix TFAlbertForSequenceClassification classifier dropout probability (@jarednielsen) Change uses of pow(x, 3) to pow(x, 3.0) (@mneilly-et) Shuffle train subset for summarization example (@Colanim) Removed the boto3 dependency (@julien-c) Add dialogpt training tips (@patrickvonplaten) Generation can now start with an empty prompt (@patrickvonplaten) GPT-2 is now traceable (@jazzcook15) Add known 3rd party to setup.cfg; removes local/circle ci isort discrepancy. (@sshleifer) Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (@thomwolf) Now using CDN urls for weights (@julien-c) [Fix common tests on GPU] send model, ids to torch_device (@sshleifer) Fix TF input docstrings to refer to tf.Tensor rather than torch.Float (@jarednielsen) Additional metadata to traing arguments (@parmarsuraj99) [ci] Load pretrained models into the default (long-lived) cache (@julien-c) add timeout_decorator to tests (@sshleifer) Added XLM-R to the multilingual section in the documentation (@stefan-it) Better num_labels in configuration objects Updated pytorch lightning scripts (@williamFalcon) Tests now pass with torch 1.5.0 (@LysandreJik) Ensure fast tokenizer can construct single-element tensor without pad token (@mfuntowicz)
Related Papers
- → Research on the Decoder Attention Structure of Multi-encoder Transformer-based Automatic Post-Editing Model(2020)1 cited
- → DESIGN AND IMPLEMENTATION OF MULTI-STANDARD AUDIO DECODER(2007)1 cited
- Design and Realization of the Audio/Video encoder and decoder based on Hi3510(2011)