• DocumentCode
    570217
  • Title

    A review of the stability of feature selection techniques for bioinformatics data

  • Author

    Awada, Wael ; Khoshgoftaar, Taghi M. ; Dittman, David ; Wald, Randall ; Napolitano, Amri

  • Author_Institution
    Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2012
  • fDate
    8-10 Aug. 2012
  • Firstpage
    356
  • Lastpage
    363
  • Abstract
    Feature selection is an important step in data mining and is used in various domains including genetics, medicine, and bioinformatics. Choosing the important features (genes) is essential for the discovery of new knowledge hidden within the genetic code as well as the identification of important biomarkers. Although feature selection methods can help sort through large numbers of genes based on their relevance to the problem at hand, the results generated tend to be unstable and thus cannot be reproduced in other experiments. Relatedly, research interest in the stability of feature ranking methods has grown recently and researchers have produced experimental designs for testing the stability of feature selection, creating new metrics for measuring stability and new techniques designed to improve the stability of the feature selection process. In this paper, we will introduce the role of stability in feature selection with DNA microarray data. We list various ways of improving feature ranking stability, and discuss feature selection techniques, specifically explaining ensemble feature ranking and presenting various ensemble feature ranking aggregation methods. Finally, we discuss experimental procedures such as dataset perturbation, fixed overlap partitioning, and cross validation procedures that help researchers analyze and measure the stability of feature ranking methods. Throughout this work, we investigate current research in the field and discuss possible avenues of continuing such research efforts.
  • Keywords
    bioinformatics; data analysis; data mining; genetics; stability; DNA microarray data; bioinformatics data; cross validation procedures; data mining; dataset perturbation; ensemble feature ranking aggregation methods; feature ranking methods; feature ranking stability; feature selection; fixed overlap partitioning; genetic code; genetics; important biomarkers; knowledge discovery; medicine; Bioinformatics; Biomarkers; DNA; Stability criteria; Testing; Thermal stability; Stability; bioinformatics; feature selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4673-2282-9
  • Electronic_ISBN
    978-1-4673-2283-6
  • Type

    conf

  • DOI
    10.1109/IRI.2012.6303031
  • Filename
    6303031