DocumentCode
2169271
Title
Effect of Feature Selection, SMOTE and under Sampling on Class Imbalance Classification
Author
Qazi, Nadeem ; Raza, Kamran
Author_Institution
Fac. of Eng. Sci. & Technol., IQRA Univ., Pakistan
fYear
2012
fDate
28-30 March 2012
Firstpage
145
Lastpage
150
Abstract
Accurate identification of network intrusions is one of the biggest challenges of Network Intrusion Detection (NID) systems. In recent years Machine learning classification techniques have been used to precisely identify network intrusion. However, the multi class distribution in network intrusion detection system has found to be highly skewed, leading to classification accuracy problem due to class imbalance data set. The work presented in this paper not only explores the role of the attribute selection in improving classification accuracy but also investigates the problem of class imbalance using the Synthetic Minority Over-sampling (SMOTE) and under sampling of major classes. The classification performance is then evaluated over several types of classifiers. The outcome of this work is that for the class imbalance data set the under-sampling technique is more effective than SMOTE in detecting minor classes. It has also found during this research work that the decision tree algorithms (JRIP) and Naïve Bayes are more accurate classifiers as compared to the Radial basis neural network and support vector machine. However no single algorithm can be used for the classification of multiclass and it is proposed in this research work that combination of classifier consisting of Naïve Bayes and JRIP could be used for the classification of minor classes in an imbalance class data set of intrusion detection system.
Keywords
Bayes methods; computer network security; decision trees; feature extraction; learning (artificial intelligence); pattern classification; radial basis function networks; support vector machines; JRIP; NID systems; Naive Bayes method; SMOTE; attribute selection; class imbalance classification; classification accuracy problem; decision tree algorithms; feature selection; machine learning classification techniques; multiclass classification; multiclass distribution; network intrusion detection systems; network intrusions identification; radial basis neural network; support vector machine; synthetic minority over-sampling; under-sampling technique; Accuracy; Classification algorithms; Decision trees; Intrusion detection; Machine learning; Machine learning algorithms; Support vector machines; Class imbalance; Feature Selection; Network intrusion; Support Vector Machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Modelling and Simulation (UKSim), 2012 UKSim 14th International Conference on
Conference_Location
Cambridge
Print_ISBN
978-1-4673-1366-7
Type
conf
DOI
10.1109/UKSim.2012.116
Filename
6205441
Link To Document