DocumentCode :
568637
Title :
Transient Fault Tolerance for ccNUMA Architecture
Author :
Xingjun Zhang ; Endong Wang ; Feilong Tang ; Meishun Yang ; Hengyi Wei ; Xiaoshe Dong
Author_Institution :
Dept. of Comput. Sci. & Technol., Xi´an Jiaotong Univ., Xi´an, China
fYear :
2012
fDate :
4-6 July 2012
Firstpage :
197
Lastpage :
202
Abstract :
Transient fault is a critical concern in the reliability of microprocessors system. The software fault tolerance is more flexible and lower cost than the hardware fault tolerance. And also, as architectural trends point toward multi core designs, there is substantial interest in adapting parallel and redundancy hardware resources for transient fault tolerance. The paper proposes a process-level fault tolerance technique, a software centric approach, which efficiently schedule and synchronize of redundancy processes with ccNUMA processors redundancy. So it can improve efficiency of redundancy processes running, and reduce time and space overhead. The paper focuses on the researching of redundancy processes error detection and handling method. A real prototype is implemented that is designed to be transparent to the application. The test results show that the system can timely detect soft errors of CPU and memory that cause the redundancy processes exception, and meanwhile ensure that the services of application is uninterrupted and delay shortly.
Keywords :
delays; error detection; error handling; memory architecture; multiprocessing systems; processor scheduling; redundancy; software fault tolerance; synchronisation; CPU; ccNUMA architecture; ccNUMA processor redundancy; delay; error detection method; error handling method; microprocessor system; multicore design; parallel resource; process level fault tolerance; processor scheduling; prototype; reliability; soft error detection; software centric approach; synchronization; transient fault tolerance; Fault tolerant systems; Hardware; Kernel; Redundancy; Synchronization; Transient analysis; Transient fault; ccNUMA; dual-process;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 Sixth International Conference on
Conference_Location :
Palermo
Print_ISBN :
978-1-4673-1328-5
Type :
conf
DOI :
10.1109/IMIS.2012.188
Filename :
6296854
Link To Document :
بازگشت