• DocumentCode
    237281
  • Title

    Automated Configuration Bug Report Prediction Using Text Mining

  • Author

    Xin Xia ; Lo, Daniel ; Weiwei Qiu ; Xingen Wang ; Bo Zhou

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
  • fYear
    2014
  • fDate
    21-25 July 2014
  • Firstpage
    107
  • Lastpage
    116
  • Abstract
    Configuration bugs are one of the dominant causes of software failures. Previous studies show that a configuration bug could cause huge financial losses in a software system. The importance of configuration bugs has attracted various research studies, e.g., To detect, diagnose, and fix configuration bugs. Given a bug report, an approach that can identify whether the bug is a configuration bug could help developers reduce debugging effort. We refer to this problem as configuration bug reports prediction. To address this problem, we develop a new automated framework that applies text mining technologies on the natural-language description of bug reports to train a statistical model on historical bug reports with known labels (i.e., Configuration or non-configuration), and the statistical model is then used to predict a label for a new bug report. Developers could apply our model to automatically predict labels of bug reports to improve their productivity. Our tool first applies feature selection techniques (e.g., Information gain and Chi-square) to pre-process the textual information in bug reports, and then applies various text mining techniques (e.g., Naive Bayes, SVM, naive Bayes multinomial) to build statistical models. We evaluate our solution on 5 bug report datasets including accumulo, activemq, camel, flume, and wicket. We show that naive Bayes multinomial with information gain achieves the best performance. On average across the 5 projects, its accuracy, configuration F-measure and non-configuration F-measure are 0.811, 0.450, and 0.880, respectively. We also compare our solution with the method proposed by Arshad et al. The results show that our proposed approach that uses naive Bayes multinomial with information gain on average improves accuracy, configuration F-measure and non-configuration F-measure scores of Arshad et al.´s method by 8.34%, 103.7%, and 4.24%, respectively.
  • Keywords
    data mining; program debugging; statistical analysis; text analysis; accumulo; activemq; bug detection; bug diagnosis; camel; configuration F-measure; configuration bug report prediction; debugging effort; feature selection techniques; flume; information gain; naive Bayes multinomial; natural-language description; software failure; statistical model; text mining; wicket; Buildings; Computer bugs; Feature extraction; Predictive models; Support vector machines; Text mining; Training; Configuration Bug; Data Mining; Feature Selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Software and Applications Conference (COMPSAC), 2014 IEEE 38th Annual
  • Conference_Location
    Vasteras
  • Type

    conf

  • DOI
    10.1109/COMPSAC.2014.17
  • Filename
    6899207