DocumentCode
36283
Title
Adaptive Algorithms for Diagnosing Large-Scale Failures in Computer Networks
Author
Tati, Srikar ; Bong Jun Ko ; Guohong Cao ; Swami, Ananthram ; La Porta, Thomas F.
Author_Institution
Inst. for Networking & Security Res., Pennsylvania State Univ., University Park, PA, USA
Volume
26
Issue
3
fYear
2015
fDate
Mar-15
Firstpage
646
Lastpage
656
Abstract
We propose a greedy algorithm, Cluster-MAX-COVERAGE (CMC), to efficiently diagnose large-scale clustered failures. We primarily address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100 percent accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. When there are fewer positive and negative symptoms at a reporting node, CMC performs much better than existing algorithms. We also propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both independent and clustered failures. During a series of failures that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.
Keywords
computer network reliability; fault diagnosis; greedy algorithms; large-scale systems; workstation clusters; AMC; CMC; adaptive algorithms; adaptive-MAX-COVERAGE; cluster-MAX-COVERAGE; false negatives; false positives; greedy algorithm; large-scale clustered failure diagnosis; Accuracy; Adaptive algorithms; Clustering algorithms; Complexity theory; Computer networks; Fault diagnosis; Network topology; Fault diagnosis; clustered failures; incomplete information; large-scale failures;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2014.2311814
Filename
6767126
Link To Document