DocumentCode :
3000817
Title :
Resilience to Various Failures for Read-mostly In-memory Data Structures
Author :
Kaplan, Larry ; Ohlrich, Miles ; Briggs, Preston ; Leslie, Will
Author_Institution :
Cray Inc., Seattle, WA, USA
fYear :
2012
fDate :
21-25 May 2012
Firstpage :
1572
Lastpage :
1580
Abstract :
As massively parallel processing (MPP) machines and their associated applications become larger, more work on resiliency is needed if those applications are to have a chance of running for significant lengths of time in the face of the expected component failure rates. This paper describes an approach for protecting large read-mostly in-memory data structures from various forms of failures by applying the concept of software erasure-correcting codes. A prototype library for this scheme was implemented on the Cray XMT and applied to a sample application. It is also portable to other global shared memory architectures that meet certain requirements, including the Cray XE.
Keywords :
data structures; fault tolerant computing; parallel processing; shared memory systems; system recovery; Cray XE; Cray XMT; component failure rates; failure resilience; massively parallel processing machines; read-mostly in-memory data structures; shared memory architectures; software erasure-correcting codes; Data structures; Databases; Face; Libraries; Memory management; Registers; Xenon; data structures; erasure codes; resilience;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0974-5
Type :
conf
DOI :
10.1109/IPDPSW.2012.198
Filename :
6270830
Link To Document :
بازگشت