DocumentCode
589292
Title
A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction
Author
Kehan Gao ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio
Author_Institution
Eastern Connecticut State Univ., Willimantic, CT, USA
Volume
2
fYear
2012
fDate
12-15 Dec. 2012
Firstpage
281
Lastpage
288
Abstract
High dimensionality and class imbalance are the two main problems affecting many software defect prediction. In this paper, we propose a new technique, named SelectRUSBoost, which is a form of ensemble learning that in-corporates data sampling to alleviate class imbalance and feature selection to resolve high dimensionality. To evaluate the effectiveness of the new technique, we apply it to a group of datasets in the context of software defect prediction. We employ two classification learners and six feature selection techniques. We compare the technique to the approach where feature selection and data sampling are used together, as well as the case where feature selection is used alone (no sampling used at all). The experimental results demonstrate that the SelectRUSBoost technique is more effective in improving classification performance compared to the other approaches.
Keywords
data handling; learning (artificial intelligence); software engineering; SelectRUSBoost; class imbalance; data sampling; ensemble learning; high dimensionality; software defect prediction; Boosting; Data models; Measurement; Prediction algorithms; Predictive models; Software; Support vector machines; class imbalance; high dimensionality; software defect prediction;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location
Boca Raton, FL
Print_ISBN
978-1-4673-4651-1
Type
conf
DOI
10.1109/ICMLA.2012.145
Filename
6406710
Link To Document