DocumentCode
3645149
Title
A new ensemble-feature-selection framework for intrusion detection
Author
Hai Thanh Nguyen;Katrin Franke;Slobodan Petrović
Author_Institution
Norwegian Information Security Laboratory, Gj⊘
fYear
2011
Firstpage
213
Lastpage
218
Abstract
Feature selection is an important part of a pattern recognition system. A feature selection method is required to be general enough to find representative features from training data, which are then used for classifying test patterns. The situation where the features selected from the training data are quite different from the representative features of the testing data is called over-selecting. The main causes of the over-selecting phenomenon are: non-comprehensive consideration of statistical properties of the training data, heuristic search strategies for feature selection and small sample size of the data set for training. In this paper, we show the influence of the over-selecting phenomenon on the over-fitting phenomenon of machine learning algorithms. We propose a new framework to address principal causes of over-selecting and thus reduce the chance of over-fitting. Our new framework that we call Ensemble Feature Selection measure (EnFS), allows to consider many statistical properties of a given data set at the same time by combining many feature selection methods used in the filter model. From the chosen feature selection measures, a new combined measure is constructed. We also propose a new search algorithm that ensures the globally optimal feature subsets by means of the constructed measure. The new search approach is based on solving a mixed 0-1 linear programming (M01LP) problem by means of the branch-and-bound algorithm. In this M01LP problem, the number of constraints and variables is linear in the number of full set features. In order to evaluate the quality of our EnFS measure, we chose the design of an intrusion detection system (IDS) as a possible application. Experimental results obtained over the KDD CUP´99 benchmarking data set for IDS show that our EnFS measure is capable of reducing over-fitting by addressing over-selecting.
Keywords
"Testing","Training data","Training","Polynomials","Computational modeling","Programming","Machine learning algorithms"
Publisher
ieee
Conference_Titel
Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on
ISSN
2164-7143
Print_ISBN
978-1-4577-1676-8
Electronic_ISBN
2164-7151
Type
conf
DOI
10.1109/ISDA.2011.6121657
Filename
6121657
Link To Document