CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
Citations Over TimeTop 10% of 2011 papers
Abstract
Checkpoint/Restart (C/R) mechanisms have been widely adopted by many MPI libraries [1-3] to achieve fault-tolerance. However, a major limitation of such mechanisms is the intensive IO bottleneck caused by the need to dump the snapshots of all processes into persistent storage. Several studies have been conducted to minimize this overhead [4,5], but most of these proposed optimizations are performed inside specific MPI stack or checkpointing library or applications, hence they are not portable enough to be applied to other MPI stacks and applications. In this paper, we propose a filesystem based approach to alleviate this checkpoint IO bottleneck. We propose a new filesystem, named Checkpoint-Restart Filesystem (CRFS), which is a lightweight user-level filesystem based on FUSE (Filesystem in Userspace). CRFS is designed with Checkpoint/Restart I/O traffic in mind to efficiently handle the concurrent write requests. Any software component using standard filesystem interfaces can transparently benefit from CRFS's capabilities. CRFS intercepts the checkpoint file write system calls and aggregates them into fewer bigger chunks which are asynchronously written to the underlying filesystem for more efficient IO. CRFS manages a flexible internal IO thread pool to throttle concurrent IO to alleviate IO contention for better IO performance. CRFS can be mounted over any standard filesystem like ext3, NFS and Lustre. We have implemented CRFS and evaluated its performance using three popular C/R capable MPI stacks: MVAPICH2, MPICH2 and OpenMPI. Experimental results show significant performance gains for all three MPI stacks. CRFS achieves up to 5.5X speedup in checkpoint writing performance to Lustre filesystem. Similar level of improvements are also obtained with ext3 and NFS filesystems. To the best of our knowledge, this is the first such portable and light-weight filesystem designed for generic Checkpoint/Restart data.
Related Papers
- → Performance Analysis and Optimization of Cyro-EM Structure Determination in RELION-2(2018)3 cited
- → Comparison of speedups for computing π using .NET TPL and OpenMP parallelization techonologies(2014)2 cited
- → Predicting and Bounding the Speedup of Multithreaded Solaris Programs(1999)4 cited
- Parallelization of hydrocodes on the Intel Hypercube: Part 2(1987)
- → Image processing in airborne applications using multicore embedded computers(2013)