DocumentCode :
2141068
Title :
A Comparison of three machine learning techniques for encrypted network traffic analysis
Author :
Arndt, Daniel J. ; Zincir-Heywood, A. Nur
Author_Institution :
Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS, Canada
fYear :
2011
fDate :
11-15 April 2011
Firstpage :
107
Lastpage :
114
Abstract :
This work evaluates three methods for encrypted traffic analysis without using the IP addresses, port number, and payload information. To this end, binary identification of SSH vs non-SSH traffic is used as a case study since the plain text initiation of the SSH protocol allows us to obtain data sets with a reliable ground truth. The methods are subject to several tests using different export options, feature sets, and training and test traffic traces for a total of 128 different configurations. Of particular interest are test cases which that use a test set from a different network than that which the model was trained on, i.e. robustness of the trained models. Results show that the multi-objective genetic algorithm (MOGA) based trained model is able to achieve the best performance among the three methods when each approach is tested on traffic traces that are captured on the same network as the training network trace. On the other hand, C4.5 achieved the best results among the three methods when tested on traffic traces which are captured on totally different networks than the training trace. Furthermore, it is shown that continuous sampling of the training data is no better than random sampling, but the training data is very important for how well the classifiers will perform on traffic traces captured from different networks. Moreover, the C4.5 based approach provides the fastest and the most human readable model, whereas the MOGA reduces the complexity of the k-means clustering algorithm tremendously.
Keywords :
IP networks; cryptography; genetic algorithms; learning (artificial intelligence); pattern clustering; protocols; telecommunication traffic; C4.5-based approach; IP addresses; MOGA-based trained model; SSH protocol; binary identification; encrypted network traffic analysis; feature sets; k-mean clustering; machine learning techniques; multiobjective genetic algorithm; nonSSH traffic; payload information; port number; random sampling; traffic traces; Clustering algorithms; Equations; Mathematical model; Payloads; Protocols; Robustness; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence for Security and Defense Applications (CISDA), 2011 IEEE Symposium on
Conference_Location :
Paris
Print_ISBN :
978-1-4244-9939-7
Type :
conf
DOI :
10.1109/CISDA.2011.5945941
Filename :
5945941
Link To Document :
بازگشت