Author :
Bacchelli, Alberto ; Bettenburg, Nicolas ; Guerrouj, Latifa
Abstract :
Software developers have long been supported by a variety of tools, such as version control systems (e.g., GIT), issue tracking systems (e.g., BugZilla), and mailing list services (e.g., Mailman). These tools accumulate a wide range of information that is recorded in the repositories these tools store their data in. This information is comprised of two significantly different types of data: structured and unstructured data. Structured data (e.g., source code or execution traces) has a well-established structure and grammar, thus is straightforward to parse and use with computer machinery. Unstructured data (e.g., documentation, discussions, comments, or customer support requests) consists of a mixture of natural language text, snippets of structured data, and noise. Mining unstructured data is very challenging since out-of-the box approaches adopted from related fields, such as Natural Language Processing and Information Retrieval, cannot be directly applied in software engineering. To tackle challenges faced when mining unstructured data and make the knowledge contained in unstructured data repositories accessible to both practitioners and researchers, we organize the 2nd workshop on Mining Unstructured Data (MUD´12). The aim is to provide a unique interactive venue for discussing in-depth challenges, approaches, and applications and share experiences, and results on the topic of mining software unstructured data.