Title :
Gingko: correlating causal paths in distributed systems
Author :
Zhang, Zhihong ; Meng, Dan ; Zhan, Jianfeng ; Wang, Lei ; Jin, Yi ; Wen, Yu ; Wang, Hui
Author_Institution :
Chinese Acad. of Sci., Beijing
Abstract :
Many large-scale systems are distributed systems of multiple communicating components. Finding causal paths of message traces between components throughout these systems is important to uncover runtime behaviors and identify the root cause of failures, but this "art" often hides in the heads of developers or domain experts. Our goal is to design tools and algorithms to help developers record this art into logs and help the modestly-skilled users and system administrators master it to make better use and management of distributed systems. In this paper, we present a methodology that automatically builds the causal paths of message traces by 1) an agreement with programmers on the style and content of logs produced by operational distributed systems they develop and 2) a correlation algorithm to build message causal paths with the clues from these logs. To validate this mechanism, we have implemented Gingko, a prototype providing a tool chain for users to gain better comprehensions of distributed systems and to debug them efficiently when errors happen.
Keywords :
distributed processing; electronic messaging; message passing; Gingko; causal paths; distributed systems; large-scale systems; message causal paths; message traces; Algorithm design and analysis; Art; Clustering algorithms; Computer networks; Laboratories; Parallel processing; Production systems; Programming profession; Prototypes; Runtime;
Conference_Titel :
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location :
Liaoning
Print_ISBN :
978-0-7695-2943-1
DOI :
10.1109/NPC.2007.46