Title :
Comparative study of machine learning techniques for pre-processing of network intrusion data
Author :
Faiza Rahat;Syed Nadeem Ahsan
Author_Institution :
Karachi Institute of Power Engineering, Paradise Point, Karachi, Pakistan
Abstract :
Machine learning is widely used for network intrusion detection but the data it uses faces problems of large feature set and class imbalance which is inherent in network traffic data. This paper focuses on the performance evaluation of different strategies used for mitigating both the problems. Data used for classification was KDDCUP´99 which is a benchmark data set for intrusion detection and suffers greatly from class imbalance problem. Noise was also added to data to evaluate the performance of classifiers for noisy data. Different combinations of strategies form different scenarios. Four possible scenarios are tested by using different combinations of sampling, feature set reduction and classification. Classifiers are used for evaluating the performance of each scenario. Feature set was reduced to nine features from forty one features. Stratified remove folds and Resampling were applied to remove class imbalance problem. Results have shown that Nearest Neighbor, J48 classifier are best suited for real time detection with pre-processing whereas Gain Ratio is suitable for feature selection.
Keywords :
"Intrusion detection","Feature extraction","Machine learning algorithms","Real-time systems","Data mining","Classification algorithms","Benchmark testing"
Conference_Titel :
Open Source Systems & Technologies (ICOSST), 2015 International Conference on
DOI :
10.1109/ICOSST.2015.7396401