• DocumentCode
    2875686
  • Title

    Shedding Light on Enterprise Network Failures Using Spotlight

  • Author

    John, Dipu ; Prakash, Pawan ; Kompella, Ramana Rao ; Chandra, Ranveer

  • Author_Institution
    Purdue Univ., West Lafayette, IN, USA
  • fYear
    2010
  • fDate
    Oct. 31 2010-Nov. 3 2010
  • Firstpage
    167
  • Lastpage
    176
  • Abstract
    Fault localization in enterprise networks is extremely challenging. A recent approach called Sherlock makes some headway into this problem by using an inference algorithm over a multi-tier probabilistic dependency graph that relates fault symptoms with possible root causes (e.g., routers, servers). A key limitation of Sherlock is its scalability because of the use of complicated inference algorithms based on Bayesian networks. We present a fault localization system called Spotlight that essentially uses two basic ideas. First, it compresses a multi-tier dependency graph into a bipartite graph with direct probabilistic edges between root causes and symptoms. Second, it runs a novel weighted greedy minimum set cover algorithm to provide fast inference. Through extensive simulations with real service dependency graphs and enterprise network topologies reported previously in literature, we show that Spotlight is about 100× faster than Sherlock in typical settings, with comparable accuracy in diagnosis.
  • Keywords
    belief networks; business data processing; fault tolerant computing; graph theory; inference mechanisms; probability; Bayesian networks; Sherlock approach; Spotlight system; bipartite graph; enterprise network failure; enterprise network topology; fault localization system; greedy minimum set cover algorithm; inference algorithm; multitier probabilistic dependency graph; service dependency graphs; Accuracy; Bayesian methods; Inference algorithms; Instruments; Network topology; Probabilistic logic; Servers; dependency graphs; enterprise networks; fault localization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 2010 29th IEEE Symposium on
  • Conference_Location
    New Delhi
  • ISSN
    1060-9857
  • Print_ISBN
    978-0-7695-4250-8
  • Type

    conf

  • DOI
    10.1109/SRDS.2010.27
  • Filename
    5623391