Title :
MINETRAC: Mining flows for unsupervised analysis & semi-supervised classification
Author :
Casas, Pedro ; Mazel, Johan ; Owezarski, Philippe
Author_Institution :
LAAS, CNRS, Toulouse, France
Abstract :
Driven by the well-known limitations of port-based and payload-based analysis techniques, the use of Machine Learning for Internet traffic analysis and classification has become a fertile research area during the past half-decade. In this paper we introduce MINETRAC, a combination of unsupervised and semi-supervised machine learning techniques capable of identifying and classifying different classes of IP flows sharing similar characteristics. The unsupervised analysis is accomplished by means of robust clustering techniques, using Sub-Space Clustering, Evidence Accumulation, and Hierarchical Clustering algorithms to explore inter-flows structure. MINETRAC permits to identify natural groupings of traffic flows, combining the evidence of data structure provided by different partitions of the same set of traffic flows. Automatic classification is performed by means of semi-supervised learning, using only a small fraction of ground-truth flows to map the identified clusters into their associated most-probable originating network service or application. We evaluate the performance of MINETRAC using real traffic traces, additionally comparing its performance against previously proposed clustering-based flow analysis methods and supervised/semi-supervised classification approaches.
Keywords :
Internet; learning (artificial intelligence); telecommunication traffic; IP flows sharing; Internet traffic analysis; MINETRAC; automatic classification; evidence accumulation; hierarchical clustering algorithms; inter-flows structure; mining flows; payload-based analysis techniques; port-based analysis techniques; semi-supervised classification; semi-supervised machine learning techniques; sub-space clustering; unsupervised analysis; unsupervised machine learning techniques; well-known limitations; Accuracy; Algorithm design and analysis; Clustering algorithms; Computational modeling; Partitioning algorithms; Training; Vegetation; Evidence Accumulation; Hierarchical Clustering; Semi-Supervised Traffic Classification; Sub-Space Clustering; Unsupervised Traffic Analysis;
Conference_Titel :
Teletraffic Congress (ITC), 2011 23rd International
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4577-1187-9
Electronic_ISBN :
978-0-9836283-0-9