LoadAtomizer: A locality and I/O load aware task scheduler for MapReduce
Citations Over TimeTop 10% of 2012 papers
Abstract
Data-intensive computing systems like MapReduce and Dryad have emerged as a framework for leveraging computing resources of a cluster. I/O bottlenecks need to be eased to improve performance in data-intensive computing systems. State-of-the-art frameworks for data-intensive computing have tackled the issue with a data locality based task scheduling policy. However, locality-aware scheduling does not always work good to mitigate I/O bottlenecks when different I/O characteristic jobs run concurrently. This paper presents LoadAtomizer, a locality and I/O load aware task scheduler for MapReduce. LoadAtomizer mitigates the I/O bottlenecks of a cluster with locality and I/O load aware map task assignment and storage selection. LoadAtomizer quickly assigns a slave a map task whose input data is stored in a lightly loaded storage and commands the slave to read the input data from the storage. LoadAtomizer maintains the load information of storages and the network with a topology-aware load tree. A topology-aware load tree enables LoadAtomizer to select quickly a lightly loaded storage that a slave can access through a lightly loaded network path. Experimental results demonstrated that our prototype of LoadAtomizer shortened completion time of multiple jobs by up to 18.6 %.
Related Papers
- → A Study of Data Locality in YARN(2015)15 cited
- → BEAST: A Buffer Replacement Algorithm Using Spatial and Temporal Locality(2006)1 cited
- A Component-Based Definition of Spatial Locality(2006)
- → Three Models of the Thinking on the Locality - On the Vector of Outside in Thinking on the Locality(2010)
- → Non‐Locality(2005)