مرکز منطقه ای اطلاع رساني علوم و فناوري - An Empirical Study on the Stability of Feature Selection for Imbalanced Software Engineering Data

DocumentCode :

589273

Title :

An Empirical Study on the Stability of Feature Selection for Imbalanced Software Engineering Data

Author :

Huanjing Wang ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio

Volume :

fYear :

2012

fDate :

12-15 Dec. 2012

Firstpage :

317

Lastpage :

323

Abstract :

In software quality modeling, software metrics are collected during the software development cycle. However, not all metrics are relevant to the class attribute (software quality). Metric (feature) selection has become the cornerstone of many software quality classification problems. Selecting software metrics that are important for software quality classification is a necessary and critical step before the model training process. Recently, the robustness (e.g., stability) of feature selection techniques has been studied, to examine the sensitivity of these techniques to changes (adding/removing program modules to/from their dataset). This work provides an empirical study regarding the stability of feature selection techniques across six software metrics datasets with varying levels of class balance. In this work eighteen feature selection techniques are evaluated. Moreover, three factors, feature subset size, degree of perturbation, and class balance of datasets, are considered in this study to evaluate stability of feature selection techniques. Experimental results show that these factors affect the stability of feature selection techniques as one might expect. We found that with few exceptions, feature ranking based on highly imbalanced datasets are less stable than based on slightly imbalanced data. Results also show that making smaller changes to the datasets has less impact on the stability of feature ranking techniques. Overall, we conclude that a careful understanding of one´s dataset (and certain choices of metric selection technique) can help practitioners build more reliable software quality models.

Keywords :

pattern classification; software metrics; software quality; feature ranking; feature selection stability; imbalanced software engineering data; model training process; software development cycle; software metrics; software quality classification problems; software quality modeling; Indexes; Measurement; Radio frequency; Software quality; Stability criteria; feature ranking; imbalanced data; stability; subsample;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Applications (ICMLA), 2012 11th International Conference on

Conference_Location :

Boca Raton, FL

Print_ISBN :

978-1-4673-4651-1

Type :

conf

DOI :

10.1109/ICMLA.2012.60

Filename :

6406682

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=589273