DocumentCode :
3077821
Title :
ProvErr: System Level Statistical Fault Diagnosis Using Dependency Model
Author :
Peng Chen ; Plale, Beth A.
Author_Institution :
Sch. of Inf. & Comput., Indiana Univ., Bloomington, IN, USA
fYear :
2015
fDate :
4-7 May 2015
Firstpage :
525
Lastpage :
534
Abstract :
Large-scale distributed systems are difficult to debug in the event of failure. Yet rapid fault diagnosis that pinpoints failures to the component level is critical to fast recovery. We introduce a statistical approach to fault diagnosis that utilizes a dependency graph of execution to automatically discover the most probable fault cause(s) at a component level (either software or hardware resource). This approach leverages engineers´ high level understanding of the system and requires a very small amount of information compared to existing methods. It also utilizes dependency information to eliminate redundant causes while retaining co-causes. Experiments using Apache Pig show that our approach has good, robust performance for diagnosing software bugs and resource shortages, and scales nearly linearly as system size increases.
Keywords :
fault diagnosis; system recovery; Apache Pig; ProvErr; dependency model; execution dependency graph; system level statistical fault diagnosis; Buildings; Computer bugs; Fault diagnosis; Hardware; Knowledge engineering; Runtime; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
Conference_Location :
Shenzhen
Type :
conf
DOI :
10.1109/CCGrid.2015.86
Filename :
7152518
Link To Document :
بازگشت