Title :
Design and Implementation of Effective Checkpointing for Multithreaded Applications on Future Clouds
Author :
Jangjaimon, Itthichok ; Nian-Feng Tzeng
Author_Institution :
Center for Adv. Comput. Studies, Univ. of Louisiana at Lafayette, Lafayette, LA, USA
fDate :
June 28 2013-July 3 2013
Abstract :
Multithreaded applications are common in high performance cloud computing systems, able to take advantage of elastic resource availability and cost fluctuation inherent to the systems. When applications involve many threads over more cores leased from the RaaS (Resource-as-a-Service) cloud under spot instance pricing for faster execution, resource unavailability are more likely to occur, undercutting execution performance gains potentially offered by those more cores. As a result, checkpointing is required to lower the adverse impact of resource unavailability on execution performance of such multithreaded applications. Given checkpointing often incurs expensive I/O to remote storage, this work presents design and implementation of our adaptive incremental checkpointing (AIC) for multithreaded applications on the RaaS clouds. AIC utilizes the idle cores for adaptive delta compression and remote checkpointing, significantly reducing the expected job turnaround time and the aggregated file size at remote storage. To ensure high compatibility and portability for AIC, we exploit techniques to avoid using kernel-specific data structures. AIC has been evaluated using PARSEC benchmarks on our established testbed, which resembles a multicore system acquired from the RaaS cloud. The results show that AIC noticeably reduces the expected turnaround time (by up to 37%) and the aggregated file size (by up to 8.3×) when compared to a recent multi-level checkpointing scheme with fixed checkpoint intervals.
Keywords :
checkpointing; cloud computing; multi-threading; multiprocessing systems; resource allocation; AIC; PARSEC benchmarks; RaaS cloud; adaptive delta compression; adaptive incremental checkpointing; checkpoint intervals; cost fluctuation; elastic resource availability; execution performance gains; high performance cloud computing systems; multicore system; multilevel checkpointing scheme; multithreaded applications; remote checkpointing; resource-as-a-service cloud; spot instance pricing; Benchmark testing; Checkpointing; Hardware; Instruction sets; Multicore processing; Pricing; Silicon; Adaptive checkpointing; RaaS clouds; delta compression; fault tolerance; incremental checkpointing; networked multicore systems;
Conference_Titel :
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5028-2
DOI :
10.1109/CLOUD.2013.57