Title :
A fault-tolerance mechanism in grid
Author :
Liang, Jin ; WeiQin, Tong ; JianQuan, Tang ; Bo, Wang
Author_Institution :
Sch. of Comput. Eng. & Sci., Shanghai Univ., China
Abstract :
Grid appears as an effective technology coupling geographically distributed resources for solving large-scale problems in the wide area network. Fault tolerance in grid system is a significant and complex issue to secure a stable and reliable performance. Until now, various techniques exist for detecting and correcting faults in distributed computing systems. Unfortunately, few energy focus on fault-tolerance in grid environment, especially with the emergence of OGSA. A new fault-tolerant mechanism is needed to detect and recover service faults and nodes crash. Based on our previous work on Java threads state capturing and existing mobile agent techniques, we put forward a fault-tolerant mechanism providing effective fault-handling and recovering methods.
Keywords :
Java; fault tolerant computing; grid computing; mobile agents; multi-threading; system recovery; wide area networks; Java thread; distributed computing system; fault tolerance; geographically distributed resource; grid system; mobile agent technique; service fault recovery; wide area network; Computer crashes; Distributed computing; Fault detection; Fault tolerance; Fault tolerant systems; Java; Large-scale systems; Mobile agents; Wide area networks; Yarn;
Conference_Titel :
Industrial Informatics, 2003. INDIN 2003. Proceedings. IEEE International Conference on
Print_ISBN :
0-7803-8200-5
DOI :
10.1109/INDIN.2003.1300379