DocumentCode :
2062067
Title :
Functional Tests of the RADIC Fault Tolerance Architecture
Author :
Duarte, Angelo ; Rexachs, Dolores ; Luque, Emilio
Author_Institution :
Comput. Archit. & Oper. Syst. Dept., Univ. Autonoma de Barcelona
fYear :
2007
fDate :
7-9 Feb. 2007
Firstpage :
278
Lastpage :
287
Abstract :
Clusters with thousand of nodes are a reality and the current trend indicates that they are becoming larger. Such large clusters are subject to a relatively high fault frequency so a fault-tolerance scheme is mandatory to assure the correct application completion. Message passing is the programming model often used in large clusters and the current implementations used to achieve fault tolerance in message passing systems do not focus in an architecture that simultaneously attends to scalability, transparency and independence of stable/central elements. The RADIC architecture was proposed and design as a fully distributed structure in order to achieve such requirements. Such architecture defines a fully distributed fault tolerance controller implemented by a set of system processes, which collaborate in order to perform all the basic functions of a fault tolerance protocol. This paper presents the test methodology used to verify the functionality of the RADIC architecture using RADICMPI, a prototype on the MPI semantic
Keywords :
fault tolerant computing; message passing; parallel architectures; RADIC fault tolerance architecture; RADICMPI; fully distributed structure; functional tests; message passing; programming model; Collaboration; Control systems; Distributed control; Fault tolerance; Fault tolerant systems; Frequency; Message passing; Protocols; Scalability; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel, Distributed and Network-Based Processing, 2007. PDP '07. 15th EUROMICRO International Conference on
Conference_Location :
Napoli
ISSN :
1066-6192
Print_ISBN :
0-7695-2784-1
Type :
conf
DOI :
10.1109/PDP.2007.45
Filename :
4135288
Link To Document :
بازگشت