Title :
Automatic Software Bug Triage System (BTS) Based on Latent Semantic Indexing and Support Vector Machine
Author :
Ahsan, Syed Nadeem ; Ferzund, Javed ; Wotawa, Franz
Author_Institution :
Inst. for Software Technol., Tech. Univ. Graz, Graz, Austria
Abstract :
A bug triage system is used for validation and allocation of bug reports to the most appropriate developers. An automatic bug triage system may reduce the software maintenance time and improve its quality by correct and timely assignment of new bug reports to the appropriate developers. In this paper, we present the techniques behind an automatic bug triage system, which is based on the categorization of bug reports. In order to obtain an automatic bug triage system we used these techniques and performed comparative experiments. We downloaded 1,983 resolved bug reports along with the developer activity data from the Mozilla open source project. We extracted the relevant features like report title, report summary etc., from each bug report, and extracted developer´s name who resolved the bug reports from the developers activity data. We processed the extracted textual data, and obtained the term-to-document matrix using parsing, filtering and term weighting methods. For term weighting methods we used simple term frequency and TFtimesIDF (term frequency inverse document frequency) methods. Furthermore, we reduced the dimensionality of the obtained term-to-document matrix by applying feature selection and latent semantic indexing methods. Finally we used seven different machine learning methods for the classification of bug reports. The best obtained bug triage system is based on latent semantic indexing and support vector machine having 44.4% classification accuracy. The average precision and recall values are 30% and 28%, respectively.
Keywords :
grammars; indexing; information filtering; learning (artificial intelligence); matrix algebra; pattern classification; program debugging; software maintenance; software quality; support vector machines; text analysis; Mozilla open source project; TFxIDF methods; automatic software bug triage system; bug reports allocation; bug reports categorization; bug reports validation; developer activity data; feature selection; features extraction; filtering; latent semantic indexing; machine learning methods; parsing; simple term frequency methods; software maintenance time; software quality; support vector machine; term frequency inverse document frequency; term weighting methods; term-to-document matrix; Data mining; Feature extraction; Filtering; Frequency; Indexing; Learning systems; Software maintenance; Software systems; Support vector machine classification; Support vector machines; Software maintenance; bug reports; bug triage; latent semantic indexing; machine learning;
Conference_Titel :
Software Engineering Advances, 2009. ICSEA '09. Fourth International Conference on
Conference_Location :
Porto
Print_ISBN :
978-1-4244-4779-4
Electronic_ISBN :
978-0-7695-3777-1
DOI :
10.1109/ICSEA.2009.92