DocumentCode :
3057911
Title :
Software-Based Detecting and Recovering from ECC-Memory Faults
Author :
Zhang, Xingjun ; Wang, Endong ; Zhang, Dong ; Wang, Yu ; Wu, Weiguo ; Dong, Xiaoshe
Author_Institution :
Dept. of Comput. Sci. & Technol., Xi´´an Jiaotong Univ., Xi´´an, China
fYear :
2011
fDate :
Nov. 30 2011-Dec. 2 2011
Firstpage :
715
Lastpage :
719
Abstract :
According to the problem that the ECC cannot correct the multibit error in ECC memory, this paper proposes a memory error processing method on software level. On the foundation of revising the Linux kernel code, the method can discover this area of influence area of memory error according to seek the process information mapping to the mistaken address. This way can avoid wastage to the user due to the system halting caused by memory error. The experimental results show that the method can have a certain degree of memory error repair and do not affect the normal work of the system.
Keywords :
Linux; error handling; fault diagnosis; storage management; ECC-memory fault; Linux kernel code; information mapping process; memory error processing method; memory error repair; software-based detection; software-based recovery; system halting; Computers; Error correction codes; Kernel; Linux; Reliability; Servers; ECC; error handling; reverse mapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Networking and Collaborative Systems (INCoS), 2011 Third International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4577-1908-0
Type :
conf
DOI :
10.1109/INCoS.2011.148
Filename :
6132897
Link To Document :
بازگشت