• DocumentCode
    1831877
  • Title

    A survey of stability analysis of feature subset selection techniques

  • Author

    Khoshgoftaar, Taghi M. ; Fazelpour, Alireza ; Huanjing Wang ; Wald, Randall

  • fYear
    2013
  • fDate
    14-16 Aug. 2013
  • Firstpage
    424
  • Lastpage
    431
  • Abstract
    With the proliferation of high-dimensional datasets across many application domains in recent years, feature selection has become an important data mining task due to its capability to improve both performance and computational efficiencies. The chosen feature subset is important not only due to its ability to improve classification performance, but also because in some domains, knowing the most important features is an end unto itself. In this latter case, one important property of a feature selection method is stability, which refers to insensitivity (robustness) of the selected features to small changes in the training dataset. In this survey paper, we discuss the problem of stability, its importance, and various stability measures used to evaluate feature subsets. We place special focus on the problem of stability as it applies to subset evaluation approaches (whether they are selected through filter-based subset techniques or wrapper-based subset selection techniques) as opposed to feature ranker stability, as subset evaluation stability leads to challenges which have been the subject of less research. We also discuss one domain of particular importance where subset evaluation (and the stability thereof) shows particular importance, but which has previously had relatively little attention for subset-based feature selection: Big Data which originates from bioinformatics.
  • Keywords
    data analysis; data mining; pattern classification; application domains; big data; bioinformatics; classification performance; computational efficiencies; data mining task; feature ranker stability; feature subset selection techniques; high-dimensional datasets; stability analysis; subset evaluation stability; Hamming distance; Indexes; Size measurement; Stability criteria; Thermal stability; Training; Feature selection; similarity measure; stability; stability measure; subset evaluation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on
  • Conference_Location
    San Francisco, CA
  • Type

    conf

  • DOI
    10.1109/IRI.2013.6642502
  • Filename
    6642502