DocumentCode :
244407
Title :
Grid-Oriented Process Clustering System for Partial Message Logging
Author :
Jitsumoto, Hideyuki ; Todoroki, Yuki ; Ishikawa, Yozo ; Sato, Mitsuhisa
Author_Institution :
Inf. Technol. Center, Univ. of Tokyo, Tokyo, Japan
fYear :
2014
fDate :
23-26 June 2014
Firstpage :
714
Lastpage :
719
Abstract :
In a computer cluster composed of many nodes, the mean time between failures becomes shorter as the number of nodes increases. This may mean that lengthy tasks cannot be performed, because they will be interrupted by failure. Therefore, fault tolerance has become an essential part of high-performance computing. Partial message logging forms clusters of processes, and coordinates a series of checkpoints to log messages between groups. Our study proposes a system of two features to improve the efficiency of partial message logging: 1) the communication log used in the clustering is recorded at runtime, and 2) a graph partitioning algorithm reduces the complexity of the system by geometrically partitioning a grid graph. The proposed system is evaluated by executing a scientific application. The results of process clustering are compared to existing methods in terms of the clustering performance and quality.
Keywords :
checkpointing; computational complexity; fault tolerant computing; graph theory; grid computing; message passing; natural sciences computing; parallel processing; checkpoints; clustering performance; communication log; complexity reduction; computer cluster; fault tolerance; graph partitioning algorithm; grid graph geometric partitioning; grid-oriented process clustering system; high-performance computing; mean time between failure; partial message logging; scientific application; Computational complexity; Fault tolerance; Fault tolerant systems; Partitioning algorithms; Runtime; Three-dimensional displays; Topology; fault tolerance; graph partition; message logging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location :
Atlanta, GA
Type :
conf
DOI :
10.1109/DSN.2014.72
Filename :
6903630
Link To Document :
بازگشت