Title :
What Is System Hang and How to Handle It
Author :
Yian Zhu ; Yue Li ; Jingling Xue ; Tian Tan ; Jialong Shi ; Yang Shen ; Chunyan Ma
Author_Institution :
Sch. of Comput. Sci., Northwestern Polytech. Univ., Xi´an, China
Abstract :
Almost every computer user has encountered an un-responsive system failure or system hang, which leaves the user no choice but to power off the computer. In this paper, the causes of such failures are analyzed in detail and one empirical hypothesis for detecting system hang is proposed. This hypothesis exploits a small set of system performance metrics provided by the OS itself, thereby avoiding modifying the OS kernel and introducing additional cost (e.g., hardware modules). Under this hypothesis, we propose SHFH, a self-healing framework to handle system hang, which can be deployed on OS dynamically. One unique feature of SHFH is that its "light-heavy" detection strategy is designed to make intelligent tradeoffs between the performance overhead and the false positive rate induced by system hang detection. Another feature is that its diagnosis-based recovery strategy offers a better granularity to recover from system hang. Our experimental results show that SHFH can cover 95.34% of system hang scenarios, with a false positive rate of 0.58% and 0.6% performance overhead, validating the effectiveness of our empirical hypothesis.
Keywords :
operating systems (computers); program diagnostics; software metrics; system recovery; OS; SHFH; diagnosis-based recovery strategy; light-heavy detection strategy; selfhealing framework; system hang detection; system performance metrics; unresponsive system failure; Computers; Educational institutions; Hardware; Kernel; Measurement; Monitoring; System performance; Fault Detection and Recovery; Operating System; Self-Healing Framework; System Hang;
Conference_Titel :
Software Reliability Engineering (ISSRE), 2012 IEEE 23rd International Symposium on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-4638-2
DOI :
10.1109/ISSRE.2012.12