• DocumentCode
    263156
  • Title

    Filter feature selection performance comparison in high-dimensional data: A theoretical and empirical analysis of most popular algorithms

  • Author

    Huertas, Carlos ; Juarez-Ramirez, Reyes

  • Author_Institution
    Dept. of Comput. Sci., Autonomous Univ. of Baja California, Tijuana, Mexico
  • fYear
    2014
  • fDate
    7-10 July 2014
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    The key idea behind feature selection is to find a subset of features that produce similar or better results as the original set while being more compact. Algorithms in this topic can be grouped in filter, wrapper and hybrid, however for very high dimensional data it has been found that the filter approach is better due to being less computational expensive. In this paper we provide a study about how information explosion has caused an impact on solutions for feature selection. A theoretical analysis is reviewed followed by an empirical comparison of 10 of the most popular filter algorithms with datasets ranging from 2400 up to 100,000 features in order to observe algorithm performance, scalability and detect current open problems. Results suggest that some of the current most popular solutions may become obsolete in the future due to the increase in dataset complexity.
  • Keywords
    data analysis; feature selection; learning (artificial intelligence); dataset complexity; filter algorithms; filter feature selection performance; high dimensional data; information explosion; Algorithm design and analysis; Classification algorithms; Feature extraction; Filtering algorithms; Information filters; Noise measurement; algorithm; feature selection; filter; high dimensional data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Fusion (FUSION), 2014 17th International Conference on
  • Conference_Location
    Salamanca
  • Type

    conf

  • Filename
    6916192