DocumentCode
237281
Title
Automated Configuration Bug Report Prediction Using Text Mining
Author
Xin Xia ; Lo, Daniel ; Weiwei Qiu ; Xingen Wang ; Bo Zhou
Author_Institution
Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
fYear
2014
fDate
21-25 July 2014
Firstpage
107
Lastpage
116
Abstract
Configuration bugs are one of the dominant causes of software failures. Previous studies show that a configuration bug could cause huge financial losses in a software system. The importance of configuration bugs has attracted various research studies, e.g., To detect, diagnose, and fix configuration bugs. Given a bug report, an approach that can identify whether the bug is a configuration bug could help developers reduce debugging effort. We refer to this problem as configuration bug reports prediction. To address this problem, we develop a new automated framework that applies text mining technologies on the natural-language description of bug reports to train a statistical model on historical bug reports with known labels (i.e., Configuration or non-configuration), and the statistical model is then used to predict a label for a new bug report. Developers could apply our model to automatically predict labels of bug reports to improve their productivity. Our tool first applies feature selection techniques (e.g., Information gain and Chi-square) to pre-process the textual information in bug reports, and then applies various text mining techniques (e.g., Naive Bayes, SVM, naive Bayes multinomial) to build statistical models. We evaluate our solution on 5 bug report datasets including accumulo, activemq, camel, flume, and wicket. We show that naive Bayes multinomial with information gain achieves the best performance. On average across the 5 projects, its accuracy, configuration F-measure and non-configuration F-measure are 0.811, 0.450, and 0.880, respectively. We also compare our solution with the method proposed by Arshad et al. The results show that our proposed approach that uses naive Bayes multinomial with information gain on average improves accuracy, configuration F-measure and non-configuration F-measure scores of Arshad et al.´s method by 8.34%, 103.7%, and 4.24%, respectively.
Keywords
data mining; program debugging; statistical analysis; text analysis; accumulo; activemq; bug detection; bug diagnosis; camel; configuration F-measure; configuration bug report prediction; debugging effort; feature selection techniques; flume; information gain; naive Bayes multinomial; natural-language description; software failure; statistical model; text mining; wicket; Buildings; Computer bugs; Feature extraction; Predictive models; Support vector machines; Text mining; Training; Configuration Bug; Data Mining; Feature Selection;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Software and Applications Conference (COMPSAC), 2014 IEEE 38th Annual
Conference_Location
Vasteras
Type
conf
DOI
10.1109/COMPSAC.2014.17
Filename
6899207
Link To Document