DocumentCode :
3200584
Title :
Exploration of Lossy Compression for Application-Level Checkpoint/Restart
Author :
Sasaki, Naoto ; Sato, Kento ; Endo, Toshio ; Matsuoka, Satoshi
Author_Institution :
Dept. of Math. & Comput. Sci., Tokyo Inst. of Technol., Tokyo, Japan
fYear :
2015
fDate :
25-29 May 2015
Firstpage :
914
Lastpage :
922
Abstract :
The scale of high performance computing (HPC) systems is exponentially growing, potentially causing prohibitive shrinkage of mean time between failures (MTBF) while the overall increase in the I/O performance of parallel file systems will be far behind the increase in scale. As such, there have been various attempts to decrease the checkpoint overhead, one of which is to employ compression techniques to the checkpoint files. While most of the existing techniques focus on lossless compression, their compression rates and thus effectiveness remain rather limited. Instead, we propose a loss compression technique based on wavelet transformation for checkpoints, and explore its impact to application results. Experimental application of our loss compression technique to a production climate application, NICAM, shows that the overall checkpoint time including compression is reduced by 81%, while relative error remains fairly constant at approximately 1.2% on overall average of all variables of compressed physical quantities compared to original checkpoint without compression.
Keywords :
checkpointing; data compression; distributed databases; parallel processing; wavelet transforms; HPC systems; MTBF; NICAM; application-level checkpoint; application-level restart; checkpoint files; compressed physical quantities; compression rates; high performance computing systems; lossy compression exploration; mean time between failures; parallel file systems; production climate application; wavelet transformation; Arrays; Checkpointing; Computational modeling; Data models; Image coding; Quantization (signal); Wavelet transforms; checkpoint; fault tolerance; lossy compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
ISSN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2015.67
Filename :
7161577
Link To Document :
بازگشت