Parallelizing DNN Training on GPUs: Challenges and Opportunities
Citations Over TimeTop 14% of 2021 papers
Abstract
In recent years, Deep Neural Networks (DNNs) have emerged as a widely adopted approach in many application domains. Training DNN models is also becoming a significant fraction of the datacenter workload. Recent evidence has demonstrated that modern DNNs are becoming more complex and the size of DNN parameters (i.e., weights) is also increasing. In addition, a large amount of input data is required to train the DNN models to reach target accuracy. As a result, the training performance becomes one of the major challenges that limit DNN adoption in real-world applications. Recent works have explored different parallelism strategies (i.e., data parallelism and model parallelism) and used multi-GPUs in datacenters to accelerate the training process. However, naively adopting data parallelism and model parallelism across multiple GPUs can lead to sub-optimal executions. The major reasons are i) the large amount of data movement that prevents the system from feeding the GPUs with the required data in a timely manner (for data parallelism); and ii) low GPU utilization caused by data dependency between layers that placed on different devices (for model parallelism).
Related Papers
- → Relating data-parallelism and (and-) parallelism in logic programs(1996)11 cited
- → Multiple cores, multiple pipes, multiple threads - do we have more parallelism than we can handle?(2005)2 cited
- → Data Parallelism, Control Parallelism, and Related Issues(2000)1 cited
- → On the Concepts of Parallelism in Biomolecular Computing(2018)
- Relating data—parallelism and (and—) parallelismin logic programs(1996)