DocumentCode :
2456574
Title :
Applying Permutation Tests for Assessing the Statistical Significance of Wrapper Based Feature Selection
Author :
Airola, Antti ; Pahikkala, Tapio ; Boberg, Jorma ; Salakoski, Tapio
Author_Institution :
Dept. of Inf. Technol., Univ. of Turku, Turku, Finland
fYear :
2010
fDate :
12-14 Dec. 2010
Firstpage :
989
Lastpage :
994
Abstract :
Feature selection is commonly used in bioinformatics applications, such as gene selection from DNA micro array data. Recently, wrapper methods have been proposed as an improvement over traditionally used filter based feature selection methods. In wrapper methods, the goodness of a feature set is often measured using the cross-validation performance of a machine learning method trained with the features. This can lead to over fitting, meaning that the cross-validation performance on the final selected feature set may be high even in cases when the selected features in fact are not informative. Evaluating the statistical significance of gained results is therefore of major concern. Non-parametric permutation tests have been previously used as a univariate filter for selecting individual features. In contrast, we propose using such tests to measure the statistical significance of the whole selection process, which is carried out by a wrapper method. We achieve computational efficiency by using a regularized least-squares based wrapper method, which combines a state-of-the-art classifier with matrix calculus based computational shortcuts for greedy forward feature selection. Permutation tests prove to be a practical tool for estimating the significance of gained results, as shown in simulations and experiments on two DNA micro array data sets.
Keywords :
bioinformatics; data mining; feature extraction; greedy algorithms; learning (artificial intelligence); least squares approximations; statistical analysis; DNA micro array data; bioinformatics applications; cross validation performance; filter based feature selection methods; gene selection; greedy forward feature selection; machine learning method; nonparametric permutation test; permutation tests; regularized least squares based wrapper method; selection process; statistical significance; wrapper based feature selection; wrapper methods; Breast cancer; Colon; DNA; Prediction algorithms; Training; Training data; feature selection; permutation test; wrapper methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4244-9211-4
Type :
conf
DOI :
10.1109/ICMLA.2010.158
Filename :
5708982
Link To Document :
بازگشت