Title :
Fast statistical relationship discovery in massive monitoring data
Author :
Zhang, Hui ; Chen, Haifeng ; Jiang, Guofei ; Meng, Xiaoqiao ; Yoshihira, Kenji
Author_Institution :
NEC Labs. America, Princeton, NJ
Abstract :
Today´s network systems are extensively instrumented for collecting a wealth of monitoring data. Statistical techniques like regression analysis can be applied to uncover rich relationships (e.g., correlation, causality, independence) between the measurement data which are further utilized for systems management tasks including fault diagnosis, configuration management, performance analysis, etc. However, one problem during this information mining process comes from the heavy computation overhead in statistical relationship discovery. In this paper, we propose a fast indexing technique to alleviate this problem by helping guide the discovery process in an optimal order. We model the optimal discovery process as the classic vertex cover problem in graph theory which is NP-complete. We use the heuristic of greedy vertex selection based on vertex degree and propose two simple algorithms for generating an estimated ranking (indexing) on the vertices (i.e., measurement points) based on the edges (the existing statistical relationships) incident to them. The two algorithms are based on random sampling and we analyze their output accuracy as a function of the sampling trials. On data traces from an operational 3G mobile network, our indexing technique performed close to the optimal solution (e.g., no more than 10% discovery time) and significantly better than random discovery (e.g., 70% less discovery time) on finding a specified percentage (e.g., 90%) of the existing relationships.
Keywords :
3G mobile communication; data mining; optimisation; statistical analysis; telecommunication computing; telecommunication network management; 3G mobile network; NP-complete problem; configuration management; fault diagnosis; graph theory; greedy vertex selection; information mining process; monitoring data; performance analysis; random sampling; regression analysis; statistical relationship discovery; Algorithm design and analysis; Indexing; Information analysis; Instruments; Laboratories; Monitoring; National electric code; Performance analysis; Regression analysis; Sampling methods;
Conference_Titel :
INFOCOM Workshops 2008, IEEE
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4244-2219-7
DOI :
10.1109/INFOCOM.2008.4544613