• DocumentCode
    7701
  • Title

    Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers

  • Author

    Dejaeger, K. ; Verbraken, T. ; Baesens, Bart

  • Author_Institution
    Dept. of Decision Sci. & Inf. Manage., Katholieke Univ. Leuven, Leuven, Belgium
  • Volume
    39
  • Issue
    2
  • fYear
    2013
  • fDate
    Feb. 2013
  • Firstpage
    237
  • Lastpage
    257
  • Abstract
    Software testing is a crucial activity during software development and fault prediction models assist practitioners herein by providing an upfront identification of faulty software code by drawing upon the machine learning literature. While especially the Naive Bayes classifier is often applied in this regard, citing predictive performance and comprehensibility as its major strengths, a number of alternative Bayesian algorithms that boost the possibility of constructing simpler networks with fewer nodes and arcs remain unexplored. This study contributes to the literature by considering 15 different Bayesian Network (BN) classifiers and comparing them to other popular machine learning techniques. Furthermore, the applicability of the Markov blanket principle for feature selection, which is a natural extension to BN theory, is investigated. The results, both in terms of the AUC and the recently introduced H-measure, are rigorously tested using the statistical framework of Demšar. It is concluded that simple and comprehensible networks with less nodes can be constructed using BN classifiers other than the Naive Bayes classifier. Furthermore, it is found that the aspects of comprehensibility and predictive performance need to be balanced out, and also the development context is an item which should be taken into account during model selection.
  • Keywords
    Markov processes; belief networks; feature extraction; learning (artificial intelligence); pattern classification; prediction theory; program testing; software fault tolerance; statistical analysis; AUC; BN classifiers; BN theory; Bayesian network classifiers; Demsar; Markov blanket principle; Naive Bayes classifier; citing predictive performance; faulty software code; feature selection; introduced H-measure; machine learning literature; model selection; predictive performance; software development; software fault prediction models; software testing; statistical framework; Bayesian methods; Capability maturity model; Machine learning; Measurement; Predictive models; Probability distribution; Software; Bayesian networks; Software fault prediction; classification; comprehensibility;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2012.20
  • Filename
    6175912