Boldio: A hybrid and resilient burst-buffer over lustre for accelerating big data I/O
Citations Over TimeTop 10% of 2016 papers
Abstract
The limitation of local storage space in the HPC environments has placed an unprecedented demand on the performance of the underlying shared parallel file systems. This has necessitated a scalable solution for running Big Data middleware (e.g., Hadoop) on HPC clusters. In this paper, we propose Boldio, a hybrid and resilient key-value store-based Burst-Buffer system Over Lustre for accelerating I/O-intensive Big Data workloads, that can leverage RDMA on high-performance interconnects and storage technologies such as PCIe-/NVMe-SSDs, etc. We demonstrate that Boldio can improve the performance of the I/O phase of Hadoop workloads running on HPC clusters; serving as a light-weight, high-performance, and resilient remote I/O staging layer between the application and Lustre. Performance evaluations show that Boldio can improve the TestDFSIO write performance over Lustre by up to 3x and TestDFSIO read performance by 7x, while reducing the execution time of Hadoop Sort benchmark by up to 30%. We demonstrate that we can significantly improve Hadoop I/O throughput over popular in-memory distributed storage systems such as Alluxio (formerly Tachyon), when high-speed local storage is limited.
Related Papers
- → High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA(2015)43 cited
- → MapReduce over Lustre: Can RDMA-Based Approach Benefit?(2014)16 cited
- → PROP: Using PCIe-Based RDMA to Accelerate Rack-Scale Communications in Data Centers(2015)5 cited
- → A multi-port 10GbE PCIe NIC featuring UDP offload and GPUDirect capabilities.(2015)3 cited
- → Design and Implementation of JNI Interface of PCIe NTB Interconnect Network for RDMA-based HDFS(2021)