Title :
Measuring Stability of Threshold-Based Feature Selection Techniques
Author :
Wang, Huanjing ; Khoshgoftaar, Taghi M.
Author_Institution :
Western Kentucky Univ., Bowling Green, KY, USA
Abstract :
Feature selection has been applied in many domains, such as text mining and software engineering. Ideally a feature selection technique should produce consistent outputs regardless of minor variations in the input data. Researchers have recently begun to examine the stability (robustness) of feature selection techniques. The stability of a feature selection method is defined as the degree of agreement between its outputs to randomly-selected subsets of the same input data. This study evaluated the stability of 11 threshold-based feature ranking techniques (rankers) when applied to 16 real-world software measurement datasets of different sizes. Experimental results demonstrate that AUC (Area Under the Receiver Operating Characteristic Curve) and PRC (Area Under the Precision-Recall Curve) performed best among the 11 rankers.
Keywords :
data handling; software metrics; area under the precision-recall curve; area under the receiver operating characteristic curve; randomly selected subsets; software engineering; software measurement datasets; stability measurement; text mining; threshold based feature selection techniques; Computational modeling; Indexes; Measurement; Robustness; Software; Stability criteria; robustness; software metrics; stability; threshold-based feature ranking;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4577-2068-0
Electronic_ISBN :
1082-3409
DOI :
10.1109/ICTAI.2011.169