Title :
Non-Myopic Feature Selection Method for Continuous Attributes and Discrete Class
Author :
Mejía-Lavalle, Manuel ; Rodríguez, Guillermo ; Arroyo, Gustavo
Author_Institution :
Inst. de Investigaciones Electricas, Morelos
Abstract :
Currently there exist diverse feature selection ranking methods and metrics for databases with pure discrete data (attributes and class), or pure continuous data. However, little work has been done for the case of continuous attributes with discrete class, and at the same time evaluating attribute subsets in a non-myopic fashion, considering its inter-dependencies or interactions. Normally what we can do is perform discretization, and then apply some traditional feature selection method; nevertheless the results vary depending on the discretization method that we utilized. Additionally, if we only evaluate isolated attributes, we probably obtain poor results, because we are not considering attribute inter-dependencies. We propose a metric and method for feature selection on continuous data with discrete class, inspired in the Shannon´s entropy and the information gain, which overcomes the above problems. In the experiments that we realized, with synthetic and real databases, the proposed method has shown to be fast and produce near optimum solutions, selecting few attributes.
Keywords :
entropy; feature extraction; Shannon entropy; continuous attribute; discrete class; discretization method; isolated attribute; nonmyopic feature selection ranking; pure continuous data; pure discrete data; Accuracy; Computer science; Data mining; Entropy; Filters; Information theory; Prediction algorithms; Predictive models; Spatial databases; Supervised learning;
Conference_Titel :
Current Trends in Computer Science, 2007. ENC 2007. Eighth Mexican International Conference on
Conference_Location :
Michoacan
Print_ISBN :
978-0-7695-2899-1
DOI :
10.1109/ENC.2007.12