Title :
Robust ensemble feature selection for high dimensional data sets
Author :
Ben Brahim, Afef ; Limam, Mohamed
Author_Institution :
LARODEC, Univ. of Tunis, Tunis, Tunisia
Abstract :
Feature selection is an important and frequently used technique in data preprocessing for performing data mining on large scale data sets. Several feature selection methods exist in the literature, each of them uses a specific feature evaluation criterion and may produce different feature subsets even when applied to the same data set. There is not a better resulting subset than the others but all the obtained subsets are the best subsets among the whole feature space. Thinking of a way to take advantage of different feature selection methods simultaneously is a challenging data mining problem. Recently, ensemble feature selection concept have been introduced to help solve this problem. Multiple feature selections are combined in order to produce more robust feature subsets and better classification results. However, one of the most critical decisions when performing ensemble feature selection is the aggregation technique to use for combining the resulting feature lists from the multiple algorithms into a single decision for each feature. In this paper, we propose a robust feature aggregation technique to combine the results of three different filter methods. Our aggregation technique is based on measuring feature algorithms confidence and conflict with the other ones in order to assign a reliability factor guiding the final feature selection. Experiments on high dimensional data sets show that the proposed approach outperforms the single feature selection algorithms as well as two well known aggregation methods in terms of classification performance.
Keywords :
data mining; pattern classification; aggregation technique; confidence measurement; conflict measurement; data mining problem; data preprocessing; feature subsets; filter methods; high dimensional data sets; reliability factor; robust ensemble feature selection; Breast cancer; Data mining; Decision trees; Machine learning algorithms; Robustness; Support vector machines;
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2013 International Conference on
Conference_Location :
Helsinki
Print_ISBN :
978-1-4799-0836-3
DOI :
10.1109/HPCSim.2013.6641406