DocumentCode :
3057896
Title :
A Scalable Fault Management Architecture for ccNUMA Server
Author :
Yang, Yan ; Zhang, Xingjun ; Wang, Endong ; Wu, Nan ; Dong, Xiaoshe
Author_Institution :
Dept. of Comput. Sci. & Technol., Xi´´an Jiaotong Univ., Xi´´an, China
fYear :
2011
fDate :
Nov. 30 2011-Dec. 2 2011
Firstpage :
709
Lastpage :
714
Abstract :
Linux servers with heterogeneous architectures present a new challenge for fault management. With the significant increase in the numbers and types of hardware components, separate fault management becomes more complex and inefficient. It is clear that centralized management, automatic recovering and scalable design must be incorporated in the modern fault management system. Based on the ccNUMA architecture, the paper proposes a scalable fault management architecture, and studies the implementation technologies. It aims to enable computers to automatically detect error, diagnose error and handle fault. The architecture uses modular design and supports distributed environment with good extensibility and scalability. In practice, the architecture is effective and can raise the reliability of servers.
Keywords :
Linux; fault tolerance; memory architecture; Linux server; automatic recovering; ccNUMA server; centralized management; distributed environment; modular design; scalable design; scalable fault management architecture; Computer architecture; Hardware; Kernel; Linux; Monitoring; Servers; Fault tolerance; Scalable Fault management; ccNUMA;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Networking and Collaborative Systems (INCoS), 2011 Third International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4577-1908-0
Type :
conf
DOI :
10.1109/INCoS.2011.35
Filename :
6132896
Link To Document :
بازگشت