DocumentCode
86917
Title
Pythia: detection, localization, and diagnosis of performance problems
Author
Kanuparthy, Partha ; Lee, Daewoo ; Matthews, William ; Dovrolis, Constantine ; Zarifzadeh, Sajjad
Volume
51
Issue
11
fYear
2013
fDate
Nov-13
Firstpage
55
Lastpage
62
Abstract
Performance problem diagnosis is a critical part of network operations in ISPs. Service providers use a combination of approaches to troubleshoot performance of their networks, such as active monitoring infrastructure and data collection (SNMP, Netflow, router logs, table dumps, etc.) along with customer trouble tickets. Some of these approaches, however, do not scale to wide area inter-domain networks due to unavailability of such data; moreover, troubleshooting is either reactive (e.g., driven by customer complaints) or (typically) automated using static thresholds. In this article, we describe the design and implementation of a system for root cause analysis and localization of performance problems in ISP networks. Our approach works with legacy monitoring infrastructure (e.g., perfSONAR deployments) and does not need specialized active probing tools or network data. Our system provides a language for network operators to define performance problem signatures, and provides near-real-time performance diagnosis and localization. We describe our deployment of Pythia in perfSONAR monitors in production networks in Georgia, covering over 250 inter-domain paths.
Keywords
monitoring; performance evaluation; real-time systems; telecommunication network routing; wide area networks; ISP networks; Netflow; Pythia; SNMP; active monitoring infrastructure; active probing tools; customer complaints; customer trouble tickets; data collection; legacy monitoring infrastructure; localization; near-real-time performance diagnosis; network data; network operations; network operators; network performance; perfSONAR deployments; performance problem diagnosis; performance problem signatures; performance problems; root cause analysis; router logs; service providers; static thresholds; table dumps; wide area inter-domain networks; Databases; Internet service providers; Metasearch; Telecommunication network management; Time series analysis; Web and internet services;
fLanguage
English
Journal_Title
Communications Magazine, IEEE
Publisher
ieee
ISSN
0163-6804
Type
jour
DOI
10.1109/MCOM.2013.6658653
Filename
6658653
Link To Document