• DocumentCode
    47194
  • Title

    Rake: Semantics Assisted Network-Based Tracing Framework

  • Author

    Yao Zhao ; Yinzhi Cao ; Yan Chen ; Ming Zhang ; Goyal, Ankur

  • Author_Institution
    Bell Labs., Murray Hill, NJ, USA
  • Volume
    10
  • Issue
    1
  • fYear
    2013
  • fDate
    Mar-13
  • Firstpage
    3
  • Lastpage
    14
  • Abstract
    The ability to trace request execution paths is critical for diagnosing performance faults in large-scale distributed systems. Previous black-box and white-box approaches are either inaccurate or invasive. We present a novel semantics-assisted gray-box tracing approach, called Rake, which can accurately trace individual request by observing network traffic. Rake infers the causality between messages by identifying polymorphic IDs in messages according to application semantics. To make Rake universally applicable, we design a Rake language so that users can easily describe necessary semantics of their applications while reusing the core Rake component. We evaluate Rake using a few popular distributed applications, including web search, distributed computing cluster, content provider network, and online chatting. Our results demonstrate Rake is much more accurate than the black-box approaches while requiring no modification to OS/applications. In the CoralCDN (a content distributed network) experiments, Rake links messages with much higher accuracy than WAP5, a state-of-the-art black-box approach. In the Hadoop (a distributed computing cluster platform) experiments, Rake helps reveal several previously unknown issues that may lead to performance degradation, including a RPC (Remote Procedure Call) abusing problem.
  • Keywords
    Internet; distributed processing; electronic messaging; remote procedure calls; software fault tolerance; CoralCDN; Hadoop; RPC; Rake language; WAP5; Web search; application semantics; black-box approaches; content distributed network; content provider network; distributed applications; distributed computing cluster; large-scale distributed systems; online chatting; performance fault diagnosis; polymorphic ID; remote procedure call; request execution paths; semantics assisted network-based tracing framework; semantics-assisted gray-box tracing approach; white-box approaches; Distributed processing; Performance evaluation; Semantics; XML; Rake; tracing framework;
  • fLanguage
    English
  • Journal_Title
    Network and Service Management, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1932-4537
  • Type

    jour

  • DOI
    10.1109/TNSM.2012.091912.120224
  • Filename
    6313581