• DocumentCode
    2353687
  • Title

    Unified debugging of distributed systems with Recon

  • Author

    Lee, Kyu Hyung ; Sumner, Nick ; Zhang, Xiangyu ; Eugster, Patrick

  • Author_Institution
    Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
  • fYear
    2011
  • fDate
    27-30 June 2011
  • Firstpage
    85
  • Lastpage
    96
  • Abstract
    To scale to today´s complex distributed software systems, debugging and replaying techniques mostly focus on single facets of software, e.g., local concurrency, distributed messaging, or data representation. This forces developers to tediously combine different technologies such as instruction-level dynamic tracing, event log analysis, or global state reconstruction to gradually explain non-trivial defects. This paper proposes Recon, a debugging system that provides iterative and interactive homogeneous debugging services. As related systems, Recon promotes SQL-like queries for debugging distributed systems. Unlike other approaches, however, Recon allows for all system artifacts including nodes, communication channels, events, or instructions to be uniformly described by relations. Also, an application in Recon originally runs with a lightweight logger that only collects replay logs for individual nodes. Developers debug a complete program by replaying the execution with fine-grained instrumentation that is capable of exposing instruction-level information. We illustrate the effectiveness of Recon on programs as diverse as BerkeleyDB, i3/Chord, RandTree, and Pastry. Our evaluation includes executions in local clusters as well as in Amazon EC2 and exhibits an unreported bug in RandTree.
  • Keywords
    SQL; program debugging; program diagnostics; software reliability; systems software; Amazon EC2; BerkeleyDB; Pastry; RandTree; Recon; SQL-like queries; communication channels; complex distributed software systems; event log analysis; global state reconstruction; i3/Chord; instruction-level dynamic tracing; interactive homogeneous debugging services; lightweight logger; nontrivial defects; program debugging; software development; unreported bug; Computer bugs; Debugging; Distributed databases; Instruments; Protocols; Runtime; Software reliability; debugging; distributed systems; instrumentation; replay;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1530-0889
  • Print_ISBN
    978-1-4244-9232-9
  • Electronic_ISBN
    1530-0889
  • Type

    conf

  • DOI
    10.1109/DSN.2011.5958209
  • Filename
    5958209