• DocumentCode
    570183
  • Title

    Exploring an iterative feature selection technique for highly imbalanced data sets

  • Author

    Khoshgoftaar, Taghi M. ; Gao, Kehan ; Napolitano, Amri

  • Author_Institution
    Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2012
  • fDate
    8-10 Aug. 2012
  • Firstpage
    101
  • Lastpage
    108
  • Abstract
    The quality of a classification model is affected by two factors in a training data set: (1) the presence of excessive features and (2) the presence of imbalanced distributions between two classes in a binary classification problem. This paper presents an iterative feature selection method to deal with these two problems. The proposed method consists of an iterative process of data sampling followed by feature ranking and finally aggregating the results generated during the iterative process. In this study, we investigate a number of feature ranking techniques and a data sampling method with two different post-sampling proportions between the two classes. We compare the iterative feature selection technique to the one where a data sampling and a feature ranking technique are used together but only once (without iteration). The empirical study is carried out on two groups of highly imbalanced data sets from a real-world software system. The results demonstrate that our proposed iterative feature selection technique performs on average better than the method without iteration.
  • Keywords
    data handling; feature extraction; iterative methods; pattern classification; sampling methods; binary classification problem; classification model; data sampling method; feature ranking; highly imbalanced data sets; imbalanced distributions; iterative feature selection technique; post-sampling proportions; training data set; Frequency modulation; Iterative methods; Radio frequency;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4673-2282-9
  • Electronic_ISBN
    978-1-4673-2283-6
  • Type

    conf

  • DOI
    10.1109/IRI.2012.6302997
  • Filename
    6302997