0 citations0 references

End-to-end performance modeling of distributed GPU applications

2020pp. 1–12

Citations Over TimeTop 20% of 2020 papers

Jaemin Choi, David F. Richards, Laxmikant V. Kalé, Abhinav Bhatelé

Abstract

With the growing number of GPU-based supercomputing platforms and GPU-enabled applications, the ability to accurately model the performance of such applications is becoming increasingly important. Most current performance models for GPU-enabled applications are limited to single node performance. In this work, we propose a methodology for end-to-end performance modeling of distributed GPU applications. Our work strives to create performance models that are both accurate and easily applicable to any distributed GPU application. We combine trace-driven simulation of MPI communication using the TraceR-CODES framework with a profiling-based roofline model for GPU kernels. We make substantial modifications to these models to capture the complex effects of both on-node and off-node networks in today's multi-GPU supercomputers. We validate our model against empirical data from GPU platforms and also vary tunable parameters of our model to observe how they might affect application performance.

Related Papers

슈퍼컴퓨터센터의 최적 운영환경을 위한 기반시설 용량 산정에 관한 연구(2010)
→ The Next-generation Supercomputer and Visuakization(2006)
Multi-level Structure Abstract and Description of Supercomputer(2008)
→ Theory and Practice of Efficient Supercomputer Management(2017)
→ Enhancing Energy sector efficiency: A study on supercomputer performance in optimizing energy systems(2024)