DocumentCode :
1733526
Title :
Improving Software Quality Estimation by Combining Boosting and Feature Selection
Author :
Kehan Gao ; Khoshgoftaar, Taghi ; Napolitano, Antonio
Author_Institution :
Eastern Connecticut State Univ., CT, USA
Volume :
1
fYear :
2013
Firstpage :
27
Lastpage :
33
Abstract :
The predictive accuracy of a classification modelis often affected by the quality of training data. However, there are two problems which may affect the quality of the training data: high dimensionality (too many independent attributes in a dataset) and class imbalance (many more instances of one class than the other class in a binary-classification problem). In this study, we present an iterative feature selection approach working with an ensemble learning method to solve both of these problems. The iterative feature selection approach samples the dataset k times and applies feature ranking to each sampled dataset, the k different rankings are then aggregated to create a single feature ranking. The ensemble learning method used is RUSBoost, in which random under sampling(RUS) is integrated into a boosting algorithm. The main purpose of this paper is to investigate the impact of feature selection as well as the RUSBoost approach on the classification performance in the context of software quality prediction. In the experiment, we explore six rankers, each used along with RUS in the iterative feature selection process. Following feature selection, models are built either using a plain learner or byusing the RUSBoost algorithm. We also examine the case of no feature selection and use this as the baseline for comparisons. The experimental results demonstrate that with the exception of one learner, feature selection combined with boosting provides better classification performance than when either is applied alone or when neither are applied.
Keywords :
iterative methods; learning (artificial intelligence); software metrics; software quality; RUSBoost; binary-classification problem; boosting algorithm; class imbalance; classification model; ensemble learning; feature ranking; iterative feature selection; software quality estimation; software quality prediction; Boosting; Data models; Iterative methods; Measurement; Predictive models; Radio frequency; Software algorithms; Data Sampling; Feature Selection; Performance Metric; RUSBoost; Software Quality Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2013 12th International Conference on
Conference_Location :
Miami, FL
Type :
conf
DOI :
10.1109/ICMLA.2013.13
Filename :
6784583
Link To Document :
بازگشت