DocumentCode :
570217
Title :
A review of the stability of feature selection techniques for bioinformatics data
Author :
Awada, Wael ; Khoshgoftaar, Taghi M. ; Dittman, David ; Wald, Randall ; Napolitano, Amri
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2012
fDate :
8-10 Aug. 2012
Firstpage :
356
Lastpage :
363
Abstract :
Feature selection is an important step in data mining and is used in various domains including genetics, medicine, and bioinformatics. Choosing the important features (genes) is essential for the discovery of new knowledge hidden within the genetic code as well as the identification of important biomarkers. Although feature selection methods can help sort through large numbers of genes based on their relevance to the problem at hand, the results generated tend to be unstable and thus cannot be reproduced in other experiments. Relatedly, research interest in the stability of feature ranking methods has grown recently and researchers have produced experimental designs for testing the stability of feature selection, creating new metrics for measuring stability and new techniques designed to improve the stability of the feature selection process. In this paper, we will introduce the role of stability in feature selection with DNA microarray data. We list various ways of improving feature ranking stability, and discuss feature selection techniques, specifically explaining ensemble feature ranking and presenting various ensemble feature ranking aggregation methods. Finally, we discuss experimental procedures such as dataset perturbation, fixed overlap partitioning, and cross validation procedures that help researchers analyze and measure the stability of feature ranking methods. Throughout this work, we investigate current research in the field and discuss possible avenues of continuing such research efforts.
Keywords :
bioinformatics; data analysis; data mining; genetics; stability; DNA microarray data; bioinformatics data; cross validation procedures; data mining; dataset perturbation; ensemble feature ranking aggregation methods; feature ranking methods; feature ranking stability; feature selection; fixed overlap partitioning; genetic code; genetics; important biomarkers; knowledge discovery; medicine; Bioinformatics; Biomarkers; DNA; Stability criteria; Testing; Thermal stability; Stability; bioinformatics; feature selection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4673-2282-9
Electronic_ISBN :
978-1-4673-2283-6
Type :
conf
DOI :
10.1109/IRI.2012.6303031
Filename :
6303031
Link To Document :
بازگشت