Exploration of Lossy Compression for Application-Level Checkpoint/Restart
Citations Over TimeTop 10% of 2015 papers
Abstract
The scale of high performance computing (HPC) systems is exponentially growing, potentially causing prohibitive shrinkage of mean time between failures (MTBF) while the overall increase in the I/O performance of parallel file systems will be far behind the increase in scale. As such, there have been various attempts to decrease the checkpoint overhead, one of which is to employ compression techniques to the checkpoint files. While most of the existing techniques focus on lossless compression, their compression rates and thus effectiveness remain rather limited. Instead, we propose a loss compression technique based on wavelet transformation for checkpoints, and explore its impact to application results. Experimental application of our loss compression technique to a production climate application, NICAM, shows that the overall checkpoint time including compression is reduced by 81%, while relative error remains fairly constant at approximately 1.2% on overall average of all variables of compressed physical quantities compared to original checkpoint without compression.
Related Papers
- → Comparison of Lossless Data Compression Techniques(2020)53 cited
- → Introduction(2006)11 cited
- LOSSLESS DATA COMPRESSION TECHNIQUES AND COMPARISON BETWEEN THE ALGORITHMS(2015)
- → Lossless Image Compression Schemes: A Review(2021)4 cited
- Analysis of common lossless compression algorithm(2009)