Title :
Fault-Tolerance Mechanism of Computation Grid Service System Based on Mobile Agent
Author :
Zhang, Zhirou ; Li, Ying
Author_Institution :
Network & Inf. Center, North China Electr. Power Univ., Beijing
Abstract :
Constructing Computation Grid Service System with idle computers in an organization to provide computation service for Mobile Agent can save funds of high-performance computing and make full use of idle resources, but Fault-Tolerance mechanism must be researched to guarantee running of computation task when nodes or networks of the system fail. Three main parts of Fault-Tolerance mechanism of the system are researched in this paper. An adaptive Fault-Detection mechanism, a non-close, non-block and low-overhead Checkpointing mechanism, and a Partial Rollback Mechanism Based on Communication Domain are proposed, which can save overhead of Fault-Tolerance. Experiments have shown their advantages.
Keywords :
fault tolerant computing; grid computing; mobile agents; checkpointing mechanism; computation grid service system; fault-tolerance mechanism; high-performance computing; mobile agent; partial rollback mechanism; Checkpointing; Communication system control; Computer architecture; Computer networks; Fault tolerance; Fault tolerant systems; Grid computing; High performance computing; Mobile agents; Mobile communication; Checkpointing; Computation Grid; Fault-Tolerance; Partial Rollback;
Conference_Titel :
Computing, Communication, Control, and Management, 2008. CCCM '08. ISECS International Colloquium on
Conference_Location :
Guangzhou
Print_ISBN :
978-0-7695-3290-5
DOI :
10.1109/CCCM.2008.39