Title :
Programming support and adaptive checkpointing for high-throughput data services with log-based recovery
Author :
Zhou, Jingyu ; Zhang, Caijie ; Tang, Hong ; Wu, Jiesheng ; Yang, Tao
Author_Institution :
Comput. Sci. Dept., Shanghai Jiao Tong Univ., Shanghai, China
fDate :
June 28 2010-July 1 2010
Abstract :
Many applications in large-scale data mining and offline processing are organized as network services, running continuously or for a long period of time. To sustain high-throughput, these services often keep their data in memory, thus susceptible to failures. On the other hand, the availability requirement for these services is not as stringent as online services exposed to millions of users. But those data-intensive offline or mining applications do require data persistence to survive failures. This paper presents programming and runtime support called SLACH for building multi-threaded high-throughput persistent services. To keep in-memory objects persistent, SLACH employs application-assisted logging and checkpointing for log-based recovery while maximizing throughput and concurrency. SLACH adaptively adjusts checkpointing frequency based on log growth and throughput demand to balance between runtime overhead and recovery speed. This paper describes the design and API of SLACH, adaptive checkpoint control, and our experiences and experiments in using SLACH at Ask.com.
Keywords :
data mining; multi-threading; adaptive checkpointing; data mining; data services; log based recovery; offline processing; online services; programming support; Availability; Buildings; Checkpointing; Concurrent computing; Data mining; Frequency; Large-scale systems; Programmable control; Runtime; Throughput;
Conference_Titel :
Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on
Conference_Location :
Chicago, IL
Print_ISBN :
978-1-4244-7500-1
Electronic_ISBN :
978-1-4244-7499-8
DOI :
10.1109/DSN.2010.5545015