Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models
2022pp. 93–106
Citations Over TimeTop 10% of 2022 papers
Shibo Wang, Jinliang Wei, Amit Sabne, Andy Davis, Berkin Ilbeyi, Blake A. Hechtman, Dehao Chen, Karthik Murthy, Marcello Maggioni, Qiao Zhang, Sameer Kumar, Tongfei Guo, Yuanzhong Xu, Zongwei Zhou
Abstract
Large deep learning models have shown great potential with state-of-the-art results in many tasks. However, running these large models is quite challenging on an accelerator (GPU or TPU) because the on-device memory is too limited for the size of these models. Intra-layer model parallelism is an approach to address the issues by partitioning individual layers or operators across multiple devices in a distributed accelerator cluster. But, the data communications generated by intra-layer model parallelism can contribute to a significant proportion of the overall execution time and severely hurt the computational efficiency.
Related Papers
- → On the duality between Or-parallelism and And-parallelism in logic programming(1995)14 cited
- → Relating data-parallelism and (and-) parallelism in logic programs(1996)11 cited
- → Relating data-parallelism and (and-) parallelism in logic programs(1995)6 cited
- → Intra-function Parallelism(2001)
- → On the Concepts of Parallelism in Biomolecular Computing(2018)