• DocumentCode
    39205
  • Title

    Using Class Imbalance Learning for Software Defect Prediction

  • Author

    Shuo Wang ; Xin Yao

  • Author_Institution
    Centre of Excellence for Res. in Comput. Intell. & Applic. (CERCIA), Univ. of Birmingham, Birmingham, UK
  • Volume
    62
  • Issue
    2
  • fYear
    2013
  • fDate
    Jun-13
  • Firstpage
    434
  • Lastpage
    443
  • Abstract
    To facilitate software testing, and save testing costs, a wide range of machine learning methods have been studied to predict defects in software modules. Unfortunately, the imbalanced nature of this type of data increases the learning difficulty of such a task. Class imbalance learning specializes in tackling classification problems with imbalanced distributions, which could be helpful for defect prediction, but has not been investigated in depth so far. In this paper, we study the issue of if and how class imbalance learning methods can benefit software defect prediction with the aim of finding better solutions. We investigate different types of class imbalance learning methods, including resampling techniques, threshold moving, and ensemble algorithms. Among those methods we studied, AdaBoost.NC shows the best overall performance in terms of the measures including balance, G-mean, and Area Under the Curve (AUC). To further improve the performance of the algorithm, and facilitate its use in software defect prediction, we propose a dynamic version of AdaBoost.NC, which adjusts its parameter automatically during training. Without the need to pre-define any parameters, it is shown to be more effective and efficient than the original AdaBoost.NC.
  • Keywords
    learning (artificial intelligence); pattern classification; program diagnostics; program testing; sampling methods; AdaBoost.NC; G-mean; area under the curve; automatic parameter adjustment; class imbalance learning; classification problems; ensemble algorithm; imbalanced distribution; learning difficulty; machine learning method; resampling technique; software module defect pediction; software testing; testing cost saving; threshold moving; Learning systems; Measurement; Niobium; Prediction algorithms; Software; Software algorithms; Training; Class imbalance learning; ensemble learning; negative correlation learning; software defect prediction;
  • fLanguage
    English
  • Journal_Title
    Reliability, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9529
  • Type

    jour

  • DOI
    10.1109/TR.2013.2259203
  • Filename
    6509481