DocumentCode :
1559493
Title :
ROC-1: hardware support for recovery-oriented computing
Author :
Oppenheimer, David ; Brown, Aaron ; Beck, James ; Hettena, Daniel ; Kuroda, Jon ; Treuhaft, N. ; Patterson, David A. ; Yelick, Katherine
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA
Volume :
51
Issue :
2
fYear :
2002
fDate :
2/1/2002 12:00:00 AM
Firstpage :
100
Lastpage :
107
Abstract :
We introduce the ROC-1 hardware platform, a large-scale cluster system designed to provide high availability for Internet service applications. The ROC-1 prototype embodies our philosophy of recovery-oriented computing (ROC) by emphasizing detection and recovery from the failures that inevitably occur in Internet service environments, rather than simple avoidance of such failures. ROC-1 promises greater availability than existing server systems by incorporating four techniques applied from the ground up to both hardware and software: redundancy and isolation, online self-testing and verification, support for problem diagnosis and concern for human interaction with the system
Keywords :
Internet; automatic testing; fault diagnosis; fault tolerant computing; redundancy; system recovery; user interfaces; workstation clusters; Internet service application availability; ROC-1 hardware platform; computer network management; failure detection; failure recovery; fault diagnosis; fault tolerance; human-system interaction; isolation; large-scale cluster system; network server systems; online self-testing; online verification; problem diagnosis; recovery-oriented computing; redundancy; software; Availability; Built-in self-test; Ground support; Hardware; Humans; Large-scale systems; Prototypes; Web and internet services; Web server;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/12.980002
Filename :
980002
Link To Document :
بازگشت