DocumentCode :
2869589
Title :
Handling Language Variations in Open Source Bug Reporting Systems
Author :
Banerjee, Sean ; Musgrove, Jesse ; Cukic, Bojan
Author_Institution :
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
fYear :
2012
fDate :
27-30 Nov. 2012
Firstpage :
325
Lastpage :
330
Abstract :
Natural language plays a critical role in the design, development and maintenance of software systems. For example, bug reporting systems allow users to submit reports describing observed anomalies in free form English. However, the free form aspect makes the detection of duplicate reports a challenge due to the breadth and diversity of language used by individual reporters. Tokenization, stemming and stop word removal are commonly used techniques to normalize and reduce the language space. However, the impact of typographical errors and alternate spellings has not been analyzed in the research literature. Our research indicates that handling language problems during automated bug triage analysis can lead to a boost in performance. We show that the language used in software problem reporting is too specialized to benefit from domain independent spell checkers or lexical databases. Therefore, we present a novel approach using word distance and neighbor word likelihood measures for detecting and resolving language-based issues in open-source software problem reporting. We evaluate our approach using the complete Firefox repository until March 2012. Our results indicate measurable improvements in duplicate detection results, while reducing the language space for most frequently used words by 30%. Moreover, our method is language-agnostic and does not require a pre-built dictionary, thus making it suitable for use in a variety of systems.
Keywords :
computational linguistics; natural language processing; program verification; public domain software; software development management; software maintenance; spelling aids; automated bug triage analysis; language variation handling; lexical database; natural language; neighbor word likelihood measure; open source bug reporting system; open source software; resolving language-based issue detection; software development; software maintenance; software system design; spelling checker; stemming technique; stop word removal; tokenization technique; word distance; Color; Context; Databases; Dictionaries; Frequency measurement; Image color analysis; Software; Alternate Spellings; Duplicate Bug Reports; Software Maintenance; Software Reliability; String Algorithms; Typographical Errors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Reliability Engineering Workshops (ISSREW), 2012 IEEE 23rd International Symposium on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-5048-8
Type :
conf
DOI :
10.1109/ISSREW.2012.85
Filename :
6405465
Link To Document :
بازگشت