DocumentCode :
2443036
Title :
Content classification of development emails
Author :
Bacchelli, Alberto ; Sasso, Tommaso Dal ; Ambros, Marco D. ; Lanza, Michele
Author_Institution :
REVEAL @ Fac. of Inf., Univ. of Lugano, Lugano, Switzerland
fYear :
2012
fDate :
2-9 June 2012
Firstpage :
375
Lastpage :
385
Abstract :
Emails related to the development of a software system contain information about design choices and issues encountered during the development process. Exploiting the knowledge embedded in emails with automatic tools is challenging, due to the unstructured, noisy, and mixed language nature of this communication medium. Natural language text is often not well-formed and is interleaved with languages with other syntaxes, such as code or stack traces. We present an approach to classify email content at line level. Our technique classifies email lines in five categories (i.e., text, junk, code, patch, and stack trace) to allow one to subsequently apply ad hoc analysis techniques for each category. We evaluated our approach on a statistically significant set of emails gathered from mailing lists of four unrelated open source systems.
Keywords :
electronic mail; natural language processing; pattern classification; public domain software; software engineering; text analysis; ad hoc analysis techniques; code category; code traces; content classification; development emails; junk category; natural language text; open source systems; patch category; software system development; stack traces; text category; Context; Data mining; Electronic mail; Java; Noise; Software; Text recognition; Emails; Empirical software engineering; Unstructured Data Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering (ICSE), 2012 34th International Conference on
Conference_Location :
Zurich
ISSN :
0270-5257
Print_ISBN :
978-1-4673-1066-6
Electronic_ISBN :
0270-5257
Type :
conf
DOI :
10.1109/ICSE.2012.6227177
Filename :
6227177
Link To Document :
بازگشت