A generic communication scheduler for distributed DNN training acceleration
Citations Over TimeTop 1% of 2019 papers
Abstract
We present ByteScheduler, a generic communication scheduler for distributed DNN training acceleration. ByteScheduler is based on our principled analysis that partitioning and rearranging the tensor transmissions can result in optimal results in theory and good performance in real-world even with scheduling overhead. To make ByteScheduler work generally for various DNN training frameworks, we introduce a unified abstraction and a Dependency Proxy mechanism to enable communication scheduling without breaking the original dependencies in framework engines. We further introduce a Bayesian Optimization approach to auto-tune tensor partition size and other parameters for different training models under various networking conditions. ByteScheduler now supports TensorFlow, PyTorch, and MXNet without modifying their source code, and works well with both Parameter Server (PS) and all-reduce architectures for gradient synchronization, using either TCP or RDMA. Our experiments show that ByteScheduler accelerates training with all experimented system configurations and DNN models, by up to 196% (or 2.96X of original speed).
Related Papers
- → In search of abstraction: The varying abstraction model of categorization(2008)102 cited
- → Modelling the Use of Abstraction in Algorithmic Problem Solving(2022)7 cited
- → Draw Me a Flower: Processing and Grounding Abstraction in Natural Language(2022)5 cited
- The Image of Abstraction(1988)
- → Draw Me a Flower: Processing and Grounding Abstraction in Natural Language(2021)