Title :
Effect of Feature Selection, SMOTE and under Sampling on Class Imbalance Classification
Author :
Qazi, Nadeem ; Raza, Kamran
Author_Institution :
Fac. of Eng. Sci. & Technol., IQRA Univ., Pakistan
Abstract :
Accurate identification of network intrusions is one of the biggest challenges of Network Intrusion Detection (NID) systems. In recent years Machine learning classification techniques have been used to precisely identify network intrusion. However, the multi class distribution in network intrusion detection system has found to be highly skewed, leading to classification accuracy problem due to class imbalance data set. The work presented in this paper not only explores the role of the attribute selection in improving classification accuracy but also investigates the problem of class imbalance using the Synthetic Minority Over-sampling (SMOTE) and under sampling of major classes. The classification performance is then evaluated over several types of classifiers. The outcome of this work is that for the class imbalance data set the under-sampling technique is more effective than SMOTE in detecting minor classes. It has also found during this research work that the decision tree algorithms (JRIP) and Naïve Bayes are more accurate classifiers as compared to the Radial basis neural network and support vector machine. However no single algorithm can be used for the classification of multiclass and it is proposed in this research work that combination of classifier consisting of Naïve Bayes and JRIP could be used for the classification of minor classes in an imbalance class data set of intrusion detection system.
Keywords :
Bayes methods; computer network security; decision trees; feature extraction; learning (artificial intelligence); pattern classification; radial basis function networks; support vector machines; JRIP; NID systems; Naive Bayes method; SMOTE; attribute selection; class imbalance classification; classification accuracy problem; decision tree algorithms; feature selection; machine learning classification techniques; multiclass classification; multiclass distribution; network intrusion detection systems; network intrusions identification; radial basis neural network; support vector machine; synthetic minority over-sampling; under-sampling technique; Accuracy; Classification algorithms; Decision trees; Intrusion detection; Machine learning; Machine learning algorithms; Support vector machines; Class imbalance; Feature Selection; Network intrusion; Support Vector Machines;
Conference_Titel :
Computer Modelling and Simulation (UKSim), 2012 UKSim 14th International Conference on
Conference_Location :
Cambridge
Print_ISBN :
978-1-4673-1366-7
DOI :
10.1109/UKSim.2012.116