Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning
Citations Over TimeTop 10% of 2016 papers
Abstract
GPUs have become the primary choice of accelerators for high-end data centers and cloud servers, which can host thousands of disparate applications. With the growing demands for GPUs on clusters, there arises a need for efficient co-execution of applications on the same accelerator device. However, the resource contention among co-executing applications causes interference which leads to degradation in execution performance, impacts QoS requirements of applications and lowers overall system throughput. While previous work has proposed techniques for detecting interference, the existing solutions are either developed for CPU clusters, or use static profiling approaches which can be computationally intensive and do not scale well. We present Mystic, an interference-aware scheduler for efficient co-execution of applications on GPU-based clusters and cloud servers. The most important feature of Mystic is the use of learning-based analytical models for detecting interference between applications. We leverage a collaborative filtering framework to characterize an incoming application with respect to the interference it may cause when co-executing with other applications while sharing GPU resources. Mystic identifies the similarities between new applications and the executing applications, and guides the scheduler to minimize the interference and improve system throughput. We train the learning model with 42 CUDA applications, and consider another separate set of 55 diverse, real-world GPU applications for evaluation. Mystic is evaluated on a live GPU cluster with 32 NVIDIA GPUs. Our framework achieves performance guarantees for 90.3% of the evaluated applications. When compared with state-of-the art interference-oblivious schedulers, Mystic improves the system throughput by 27.5% on average, and achieves a 16.3% improvement on average in GPU utilization.
Related Papers
- → Multi-level parallelism for incompressible flow computations on GPU clusters(2012)63 cited
- → Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA(2012)34 cited
- → Parallelizing Motion JPEG 2000 with CUDA(2009)18 cited
- → A parallel design of computer Go engine on CUDA-enabled GPU(2011)1 cited
- → Efficiency of using NVIDIA coprocessors in modeling the behavior of charge carriers in graphene(2021)