مرکز منطقه ای اطلاع رساني علوم و فناوري - Impact of Data Sampling on Stability of Feature Selection for Software Measurement Data

DocumentCode :

2652581

Title :

Impact of Data Sampling on Stability of Feature Selection for Software Measurement Data

Author :

Kehan Gao ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio

Author_Institution :

Eastern Connecticut State Univ., Willimantic, CT, USA

fYear :

2011

fDate :

7-9 Nov. 2011

Firstpage :

1004

Lastpage :

1011

Abstract :

Software defect prediction can be considered a binary classification problem. Generally, practitioners utilize historical software data, including metric and fault data collected during the software development process, to build a classification model and then employ this model to predict new program modules as either fault-prone (fp) or not-fault-prone (nfp). Limited project resources can then be allocated according to the prediction results by (for example) assigning more reviews and testing to the modules predicted to be potentially defective. Two challenges often come with the modeling process: (1) high-dimensionality of software measurement data and (2) skewed or imbalanced distributions between the two types of modules (fp and nfp) in those datasets. To overcome these problems, extensive studies have been dedicated towards improving the quality of training data. The commonly used techniques are feature selection and data sampling. Usually, researchers focus on evaluating classification performance after the training data is modified. The present study assesses a feature selection technique from a different perspective. We are more interested in studying the stability of a feature selection method, especially in understanding the impact of data sampling techniques on the stability of feature selection when using the sampled data. Some interesting findings are found based on two case studies performed on datasets from two real-world software projects.

Keywords :

pattern classification; sampling methods; software development management; software fault tolerance; software metrics; binary classification problem; classification performance; data sampling; fault data; fault prone program modules; feature selection stability; metric data; not-fault-prone program modules; real world software projects; software defect prediction; software measurement data; Indexes; Integrated circuits; Software; Software measurement; Stability criteria; data sampling; defect prediction; feature selection; software metrics; stability;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on

Conference_Location :

Boca Raton, FL

ISSN :

1082-3409

Print_ISBN :

978-1-4577-2068-0

Electronic_ISBN :

1082-3409

Type :

conf

DOI :

10.1109/ICTAI.2011.172

Filename :

6103463

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2652581