DocumentCode :
3056183
Title :
RAS by the Yard
Author :
Wood, Alan ; Nathan, Swami
Author_Institution :
Sun Microsyst. Inc., Santa Clara
fYear :
2007
fDate :
25-28 June 2007
Firstpage :
606
Lastpage :
611
Abstract :
Different applications require different levels of fault tolerance. Therefore, it is important to create a flexible architecture that allows a customer to choose the appropriate amount of fault tolerance, a concept we call "RAS by the yard. " In this paper we describe a next generation supercomputer and the design flexibility that allows us to offer a range of alternatives for RAS (reliability, availability, serviceability). In particular we explain how checkpointing can provide an availability continuum. Design alternatives that improve RAS may be expensive, so it is important to do cost/benefit studies of the alternatives. For a fixed budget and specified system balance ratios, such as Bytes/FIOPS, we analyze the system performance impact of alternative RAS strategies. We show how to optimize the amount of RAS purchased by using a performability measure.
Keywords :
parallel machines; performance evaluation; availability; fault tolerance; flexible architecture; next generation supercomputer; reliability; serviceability; Availability; Bandwidth; Checkpointing; Cost benefit analysis; Fault tolerance; Performance analysis; Performance evaluation; Sun; Supercomputers; System performance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks, 2007. DSN '07. 37th Annual IEEE/IFIP International Conference on
Conference_Location :
Edinburgh
Print_ISBN :
0-7695-2855-4
Type :
conf
DOI :
10.1109/DSN.2007.80
Filename :
4273011
Link To Document :
بازگشت