• DocumentCode
    185641
  • Title

    Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction

  • Author

    Huihua Lu ; Kocaguneli, Ekrem ; Cukic, Bojan

  • Author_Institution
    Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
  • fYear
    2014
  • fDate
    3-6 Nov. 2014
  • Firstpage
    312
  • Lastpage
    322
  • Abstract
    Accurate detection of defects prior to product release helps software engineers focus verification activities on defect prone modules, thus improving the effectiveness of software development. A common scenario is to use the defects from prior releases to build the prediction model for the upcoming release, typically through a supervised learning method. As software development is a dynamic process, fault characteristics in subsequent releases may vary. Therefore, supplementing the defect information from prior releases with limited information about the defects from the current release detected early seems to offer intuitive and practical benefits. We propose active learning as a way to automate the development of models which improve the performance of defect prediction between successive releases. Our results show that the integration of active learning with uncertainty sampling consistently outperforms the corresponding supervised learning approach. We further improve the prediction performance with feature compression techniques, where feature selection or dimensionality reduction is applied to defect data prior to active learning. We observe that dimensionality reduction techniques, particularly multidimensional scaling with random forest similarity, work better than feature selection due to their ability to identify and combine essential information in data set features. We present the improvements offered by this methodology through the prediction of defective modules in the three successive versions of Eclipse.
  • Keywords
    configuration management; feature selection; learning (artificial intelligence); program verification; software fault tolerance; uncertainty handling; Eclipse; active learning; data set features; defect information; defect prediction; defect prone modules; defective modules; defects detection; dimensionality reduction; fault characteristics; feature compression techniques; feature selection; multidimensional scaling; prediction model; random forest similarity; software development; software versions; supervised learning method; uncertainty sampling; verification activities; Computational modeling; Measurement; Predictive models; Radio frequency; Software; Supervised learning; Uncertainty; Active learning; Complexity measures; Dimensionality reduction; Machine learning; Software defect prediction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Reliability Engineering (ISSRE), 2014 IEEE 25th International Symposium on
  • Conference_Location
    Naples
  • ISSN
    1071-9458
  • Print_ISBN
    978-1-4799-6032-3
  • Type

    conf

  • DOI
    10.1109/ISSRE.2014.35
  • Filename
    6982637