DocumentCode :
627557
Title :
Pattern detection in unstructured data: An experience for a virtualized IT infrastructure
Author :
Marvasti, Mazda A. ; Poghosyan, Arnak V. ; Harutyunyan, Ashot N. ; Grigoryan, Naira M.
fYear :
2013
fDate :
27-31 May 2013
Firstpage :
1048
Lastpage :
1053
Abstract :
Data-agnostic management of today´s virtualized and cloud IT infrastructures motivates statistical inference from unstructured or semi-structured data. We introduce a universal approach to the determination of statistically relevant patterns in unstructured data, and then showcase its application to log data of a Virtual Center (VMware´s virtualization management software). The premise of this study is that the unstructured data can be converted into events, where an event is defined by time, source, and a series of attributes. Every event can have any number of attributes but all must have a time stamp and optionally a source of origination (be it a server, a location, a business process, etc.) The statistical relevance of the data can then be made clear via determining the joint and prior probabilities of events using a discrete probability computation. From this we construct a Directed Virtual Graph with nodes representing events and the branches representing the conditional probabilities between two events. Employing information-theoretic measures the graphs are reduced to a subset of relevant nodes and connections. Moreover, the information contained in the unstructured data set is extracted from these graphs by detecting particular patterns of interest.
Keywords :
directed graphs; inference mechanisms; pattern recognition; probability; statistical analysis; virtualisation; VMware virtualization management software; cloud IT infrastructures; conditional probability; data-agnostic management; directed virtual graph; discrete probability computation; information-theoretic measures; pattern detection; semistructured data; statistical inference; unstructured data set; virtual center; virtualized IT infrastructure; Correlation; Data mining; Joints; Mutual information; Nickel; Probability; Servers; Fault management; directed graph; event correlation; pattern detection; unstructured data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on
Conference_Location :
Ghent
Print_ISBN :
978-1-4673-5229-1
Type :
conf
Filename :
6573128
Link To Document :
بازگشت