Title :
Multi-objective genetic algorithm approach to feature subset optimization
Author_Institution :
Dept. of Comput. Sci. & Eng., Guru Jambheshwar Univ. of Sci. & Technol., Hisar, India
Abstract :
The presence of unimportant and superfluous features in datasets motivates researchers to devise novel feature selection strategies. The problem of feature selection is multi-objective in nature and hence optimizing feature subsets with respect to any single evaluation criteria is not sufficient [1]. Moreover, discovering a single best subset of features is not of much interest. In fact, finding several feature subsets reflecting a trade off among several objective criteria is more beneficial as it provides the users a broad choice for feature subset selection. Thus, in order to combine several feature selection criteria, we propose multi-objective optimization of feature subsets using Multi-Objective Genetic Algorithm. This work is an attempt to discover non-dominated feature subsets of smaller cardinality with high predictive power and least redundancy. To meet this purpose we have used NSGA II, a well known Multi-objective Genetic Algorithm (MOGA), for discovering non-dominated feature subsets for the task of classification. The main contribution of this paper is the design of a novel multi-objective fitness function consisting of information gain, mutual correlation and size of the feature subset as the multi-optimization criteria. The suggested approach is validated on seven datasets from the UCI machine learning repository. Support Vector Machine, a well tested classification algorithm is used to measure the classification accuracy. The results confirm that the proposed system is able to discover diverse optimal feature subsets that are well spread in the overall feature space and the classification accuracy of the resulting feature subsets is reasonably high.
Keywords :
correlation methods; feature selection; genetic algorithms; pattern classification; support vector machines; MOGA; NSGA II; UCI machine learning repository; cardinality; classification accuracy; classification algorithm; feature selection criteria; feature selection strategies; feature space; feature subset optimization; feature subset selection; feature subset size; information gain; multiobjective fitness function; multiobjective genetic algorithm; multiobjective optimization; multioptimization criteria; mutual correlation; nondominated feature subsets; objective criteria; predictive power; support vector machine; Accuracy; Classification algorithms; Correlation; Filtering algorithms; Genetic algorithms; Linear programming; Optimization; Feature subset selection; Multi-Objective Genetic Algorithm; Multi-objective optimization; Non-dominated solutions;
Conference_Titel :
Advance Computing Conference (IACC), 2014 IEEE International
Conference_Location :
Gurgaon
Print_ISBN :
978-1-4799-2571-1
DOI :
10.1109/IAdCC.2014.6779383