DocumentCode
88213
Title
Two-Stage Cost-Sensitive Learning for Software Defect Prediction
Author
Mingxia Liu ; Linsong Miao ; Daoqiang Zhang
Author_Institution
Sch. of Comput. Sci. & Technol., Nanjing Univ. of Aeronaut. & Astronaut., Nanjing, China
Volume
63
Issue
2
fYear
2014
fDate
Jun-14
Firstpage
676
Lastpage
686
Abstract
Software defect prediction (SDP), which classifies software modules into defect-prone and not-defect-prone categories, provides an effective way to maintain high quality software systems. Most existing SDP models attempt to attain lower classification error rates other than lower misclassification costs. However, in many real-world applications, misclassifying defect-prone modules as not-defect-prone ones usually leads to higher costs than misclassifying not-defect-prone modules as defect-prone ones. In this paper, we first propose a new two-stage cost-sensitive learning (TSCS) method for SDP, by utilizing cost information not only in the classification stage but also in the feature selection stage. Then, specifically for the feature selection stage, we develop three novel cost-sensitive feature selection algorithms, namely, Cost-Sensitive Variance Score (CSVS), Cost-Sensitive Laplacian Score (CSLS), and Cost-Sensitive Constraint Score (CSCS), by incorporating cost information into traditional feature selection algorithms. The proposed methods are evaluated on seven real data sets from NASA projects. Experimental results suggest that our TSCS method achieves better performance in software defect prediction compared to existing single-stage cost-sensitive classifiers. Also, our experiments show that the proposed cost-sensitive feature selection methods outperform traditional cost-blind feature selection methods, validating the efficacy of using cost information in the feature selection stage.
Keywords
feature selection; learning (artificial intelligence); matrix algebra; pattern classification; software reliability; CSCS; CSLS; CSVS; SDP; TSCS; classification error rates; cost-sensitive Laplacian score; cost-sensitive constraint score; cost-sensitive variance score; feature selection algorithms; feature selection stage; software defect prediction; two-stage cost-sensitive learning; Neural networks; Prediction algorithms; Software algorithms; Software metrics; Software systems; Cost-sensitive learning; feature selection; software defect prediction;
fLanguage
English
Journal_Title
Reliability, IEEE Transactions on
Publisher
ieee
ISSN
0018-9529
Type
jour
DOI
10.1109/TR.2014.2316951
Filename
6803085
Link To Document