• DocumentCode
    88213
  • Title

    Two-Stage Cost-Sensitive Learning for Software Defect Prediction

  • Author

    Mingxia Liu ; Linsong Miao ; Daoqiang Zhang

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Nanjing Univ. of Aeronaut. & Astronaut., Nanjing, China
  • Volume
    63
  • Issue
    2
  • fYear
    2014
  • fDate
    Jun-14
  • Firstpage
    676
  • Lastpage
    686
  • Abstract
    Software defect prediction (SDP), which classifies software modules into defect-prone and not-defect-prone categories, provides an effective way to maintain high quality software systems. Most existing SDP models attempt to attain lower classification error rates other than lower misclassification costs. However, in many real-world applications, misclassifying defect-prone modules as not-defect-prone ones usually leads to higher costs than misclassifying not-defect-prone modules as defect-prone ones. In this paper, we first propose a new two-stage cost-sensitive learning (TSCS) method for SDP, by utilizing cost information not only in the classification stage but also in the feature selection stage. Then, specifically for the feature selection stage, we develop three novel cost-sensitive feature selection algorithms, namely, Cost-Sensitive Variance Score (CSVS), Cost-Sensitive Laplacian Score (CSLS), and Cost-Sensitive Constraint Score (CSCS), by incorporating cost information into traditional feature selection algorithms. The proposed methods are evaluated on seven real data sets from NASA projects. Experimental results suggest that our TSCS method achieves better performance in software defect prediction compared to existing single-stage cost-sensitive classifiers. Also, our experiments show that the proposed cost-sensitive feature selection methods outperform traditional cost-blind feature selection methods, validating the efficacy of using cost information in the feature selection stage.
  • Keywords
    feature selection; learning (artificial intelligence); matrix algebra; pattern classification; software reliability; CSCS; CSLS; CSVS; SDP; TSCS; classification error rates; cost-sensitive Laplacian score; cost-sensitive constraint score; cost-sensitive variance score; feature selection algorithms; feature selection stage; software defect prediction; two-stage cost-sensitive learning; Neural networks; Prediction algorithms; Software algorithms; Software metrics; Software systems; Cost-sensitive learning; feature selection; software defect prediction;
  • fLanguage
    English
  • Journal_Title
    Reliability, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9529
  • Type

    jour

  • DOI
    10.1109/TR.2014.2316951
  • Filename
    6803085