• DocumentCode
    21145
  • Title

    Reducing Features to Improve Code Change-Based Bug Prediction

  • Author

    Shivaji, S. ; Whitehead, E. James ; Akella, R. ; Sunghun Kim

  • Author_Institution
    Dept. of Comput. Sci., Univ. of California, Santa Cruz, Santa Cruz, CA, USA
  • Volume
    39
  • Issue
    4
  • fYear
    2013
  • fDate
    Apr-13
  • Firstpage
    552
  • Lastpage
    569
  • Abstract
    Machine learning classifiers have recently emerged as a way to predict the introduction of bugs in changes made to source code files. The classifier is first trained on software history, and then used to predict if an impending change causes a bug. Drawbacks of existing classifier-based bug prediction techniques are insufficient performance for practical use and slow prediction times due to a large number of machine learned features. This paper investigates multiple feature selection techniques that are generally applicable to classification-based bug prediction methods. The techniques discard less important features until optimal classification performance is reached. The total number of features used for training is substantially reduced, often to less than 10 percent of the original. The performance of Naive Bayes and Support Vector Machine (SVM) classifiers when using this technique is characterized on 11 software projects. Naive Bayes using feature selection provides significant improvement in buggy F-measure (21 percent improvement) over prior change classification bug prediction results (by the second and fourth authors [28]). The SVM´s improvement in buggy F-measure is 9 percent. Interestingly, an analysis of performance for varying numbers of features shows that strong performance is achieved at even 1 percent of the original number of features.
  • Keywords
    belief networks; learning (artificial intelligence); pattern classification; program debugging; support vector machines; SVM classifier; buggy F-measure; classification performance; classifier-based bug prediction; code change-based bug prediction; feature selection technique; machine learned feature reduction; machine learning classifier; naive Bayes classifier; software history; software project; source code file; support vector machine; Computer bugs; Feature extraction; History; Machine learning; Measurement; Software; Support vector machines; Reliability; bug prediction; feature selection; machine learning;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2012.43
  • Filename
    6226427