• DocumentCode
    1966458
  • Title

    Reproducing non-deterministic bugs with lightweight recording in production environments

  • Author

    Wang, Nan ; Han, Jizhong ; Fu, Haiping ; He, Xubin ; Fang, Jinyun

  • Author_Institution
    Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
  • fYear
    2010
  • fDate
    9-11 Dec. 2010
  • Firstpage
    89
  • Lastpage
    96
  • Abstract
    Reproducing non-deterministic bugs is challenging. Recording program execution in production environments and reproducing bugs is an effective way to re-enable cyclic debugging. Unfortunately, most current record-replay approaches introduce large perturbations to either environments and/or execution flow, in addition to performance penalty and high storage overhead, which make them impracticable to be deployed in production environments. This paper presents Snitchaser - a fully user-space record-replay tool which can faithfully reproduce bugs by replaying system calls which are recorded with negligible perturbation and recording overhead. This is achieved by 1) a novel, lightweight system call interception mechanism without patching the binary instructions to reduce the perturbation to execution flow; 2) system call latch to save signal semantic; 3) periodic checkpointing to reduce the storage overhead. Snitchaser focuses on bugs caused by asynchronous events on heavily loaded, high throughput servers. Experimental results show that Snitchaser is capable of reproducing non-deterministic bugs efficiently at nearly no performance penalty. We also present two case studies on dealing with existing bugs in Lighttpd - a popular software used in many large scale systems.
  • Keywords
    checkpointing; product development; program debugging; Lighttpd; Snitchaser; cyclic debugging; execution flow; lightweight recording; lightweight system call interception mechanism; nondeterministic bug reproduction; performance penalty; periodic checkpointing; production environments; program execution recording; system call latch; user-space record-replay tool; Checkpointing; Computer bugs; Debugging; Hardware; Production; Semantics; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance Computing and Communications Conference (IPCCC), 2010 IEEE 29th International
  • Conference_Location
    Albuquerque, NM
  • ISSN
    1097-2641
  • Print_ISBN
    978-1-4244-9330-2
  • Type

    conf

  • DOI
    10.1109/PCCC.2010.5682332
  • Filename
    5682332