DocumentCode :
570183
Title :
Exploring an iterative feature selection technique for highly imbalanced data sets
Author :
Khoshgoftaar, Taghi M. ; Gao, Kehan ; Napolitano, Amri
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2012
fDate :
8-10 Aug. 2012
Firstpage :
101
Lastpage :
108
Abstract :
The quality of a classification model is affected by two factors in a training data set: (1) the presence of excessive features and (2) the presence of imbalanced distributions between two classes in a binary classification problem. This paper presents an iterative feature selection method to deal with these two problems. The proposed method consists of an iterative process of data sampling followed by feature ranking and finally aggregating the results generated during the iterative process. In this study, we investigate a number of feature ranking techniques and a data sampling method with two different post-sampling proportions between the two classes. We compare the iterative feature selection technique to the one where a data sampling and a feature ranking technique are used together but only once (without iteration). The empirical study is carried out on two groups of highly imbalanced data sets from a real-world software system. The results demonstrate that our proposed iterative feature selection technique performs on average better than the method without iteration.
Keywords :
data handling; feature extraction; iterative methods; pattern classification; sampling methods; binary classification problem; classification model; data sampling method; feature ranking; highly imbalanced data sets; imbalanced distributions; iterative feature selection technique; post-sampling proportions; training data set; Frequency modulation; Iterative methods; Radio frequency;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4673-2282-9
Electronic_ISBN :
978-1-4673-2283-6
Type :
conf
DOI :
10.1109/IRI.2012.6302997
Filename :
6302997
Link To Document :
بازگشت