DocumentCode :
430993
Title :
Lightweight fault detection for shared virtual memory clusters
Author :
Kongmunvattana, Angkul ; Varol, Yaakov ; Zeng, Jiangtao
Author_Institution :
Dept. of Comput. Sci. & Eng., Nevada Univ., Reno, NV, USA
Volume :
B
fYear :
2004
fDate :
21-24 Nov. 2004
Firstpage :
171
Abstract :
Shared virtual memory (SVM) is a practical approach for providing a simple parallel programming environment on a cluster of computers since it permits programmers to assume the existence of a shared memory image across physically distributed memory systems, obviating the need of explicit message passing operations. While several fault-tolerant techniques for crash recovery support in SVM have been proposed and studied extensively, very little attention has been given to fault detection issues. In this paper, we propose and evaluate a new fault detection technique, called lightweight fault detection (LFD). Our experimental results confirmed that LFD provides a swift fault detection support to SVM and incurs very little overhead (1.42% on average) during a failure-free execution. Hence, a combination of LFD with previously proposed crash recovery techniques make cluster computing on SVM more reliable and attractive.
Keywords :
distributed shared memory systems; fault tolerant computing; message passing; parallel programming; system recovery; virtual storage; workstation clusters; cluster computing; crash recovery; distributed memory systems; fault-tolerant techniques; lightweight fault detection; message passing operations; parallel programming environment; shared virtual memory cluster; Computer crashes; Concurrent computing; Distributed computing; Fault detection; Fault tolerance; Message passing; Parallel programming; Physics computing; Programming profession; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2004. 2004 IEEE Region 10 Conference
Print_ISBN :
0-7803-8560-8
Type :
conf
DOI :
10.1109/TENCON.2004.1414559
Filename :
1414559
Link To Document :
بازگشت