Exploring Plan-Based Scheduling for Large-Scale Computing Systems
Citations Over TimeTop 18% of 2016 papers
Abstract
As HPC systems scale toward exascale, it becomes critical to manage the underlying resource more effectively. While almost all existing resource management systems schedule jobs in a queuing fashion and have drawbacks of making isolated scheduling decisions that would compromise system performance even with backfilling, plan-based schedulers have the potential to generate better job schedules by producing an execution plan of all waiting jobs but do not receive enough attention. In this paper, we present a novel plan-based scheduling system that utilizes simulated annealing as the optimization engine to support effective resource management on HPC systems. As demonstrated by extensive trace-based simulations with workload traces collected from a wide range of production supercomputers, in comparison with the queue-based scheduling system using FCFS with EASY backfilling, our plan-based scheduling system can reduce the job wait time by 40%, reduce the job response time by 30%, while slightly improving system utilization at the same time. Moreover, our plan-based system is able to run online by solving the scheduling problem at each scheduling iteration within one second, making it practical for production HPC systems.
Related Papers
- → CONFIRMATION AND REVIEW OF THE “EXPLANATION OF THE PLAN FOR OSAKA” BY HANROKU YAMAGUCHI(PART 1) : STREET PLAN(2021)5 cited
- → The evolving urban planning The Case of The City of Yogyakarta(1991)8 cited
- → The functional plan(1958)
- Discussion on the Copyright of Teaching Plan(2004)
- → Establishing a 401(k) Plan(2010)