• DocumentCode
    430993
  • Title

    Lightweight fault detection for shared virtual memory clusters

  • Author

    Kongmunvattana, Angkul ; Varol, Yaakov ; Zeng, Jiangtao

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Nevada Univ., Reno, NV, USA
  • Volume
    B
  • fYear
    2004
  • fDate
    21-24 Nov. 2004
  • Firstpage
    171
  • Abstract
    Shared virtual memory (SVM) is a practical approach for providing a simple parallel programming environment on a cluster of computers since it permits programmers to assume the existence of a shared memory image across physically distributed memory systems, obviating the need of explicit message passing operations. While several fault-tolerant techniques for crash recovery support in SVM have been proposed and studied extensively, very little attention has been given to fault detection issues. In this paper, we propose and evaluate a new fault detection technique, called lightweight fault detection (LFD). Our experimental results confirmed that LFD provides a swift fault detection support to SVM and incurs very little overhead (1.42% on average) during a failure-free execution. Hence, a combination of LFD with previously proposed crash recovery techniques make cluster computing on SVM more reliable and attractive.
  • Keywords
    distributed shared memory systems; fault tolerant computing; message passing; parallel programming; system recovery; virtual storage; workstation clusters; cluster computing; crash recovery; distributed memory systems; fault-tolerant techniques; lightweight fault detection; message passing operations; parallel programming environment; shared virtual memory cluster; Computer crashes; Concurrent computing; Distributed computing; Fault detection; Fault tolerance; Message passing; Parallel programming; Physics computing; Programming profession; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON 2004. 2004 IEEE Region 10 Conference
  • Print_ISBN
    0-7803-8560-8
  • Type

    conf

  • DOI
    10.1109/TENCON.2004.1414559
  • Filename
    1414559