DocumentCode
650686
Title
Content Categorization of API Discussions
Author
Daqing Hou ; Lingfeng Mo
Author_Institution
Dept. of Electr. & Comput. Eng., Clarkson Univ., Potsdam, NY, USA
fYear
2013
fDate
22-28 Sept. 2013
Firstpage
60
Lastpage
69
Abstract
Text categorization, automatically labeling natural language text with pre-defined semantic categories, is an essential task for managing the abundant online data. An example of such data in Software Engineering is the large, ever-growing volume of forum discussions on how to use particular APIs. We have conducted a study to explore the question as to how well machine learning algorithms can be applied to categorize API discussions based on their content. Our goal is two-fold: (1) Can a relatively straightforward algorithm such as Naive Bayes work sufficiently well for this task? (2) If yes, how can we control its performance? We have achieved the best test accuracy mean (TAM) of 94.1% with our largest training data set for the AWT/Swing API, which consists of 833 forum discussions distributed over eight categories/topics. We have also investigated factors that impact classification accuracy, with the most important two being the size of the training set and multi-label documents (the phenomenon that some discussions involve more than one category).
Keywords
application program interfaces; learning (artificial intelligence); text analysis; API discussions; AWT-Swing API; Naive Bayes; TAM; automatic natural language text labeling; content categorization; machine learning algorithms; multilabel documents; online data management; pre-defined semantic category; software engineering; test accuracy mean; text categorization; training data set; Accuracy; Machine learning algorithms; Mathematical model; Message systems; Software; Training; Training data; APIs; AWT/Swing; MALLET; Naive Bayes; Online Forums; Text Categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Maintenance (ICSM), 2013 29th IEEE International Conference on
Conference_Location
Eindhoven
ISSN
1063-6773
Type
conf
DOI
10.1109/ICSM.2013.17
Filename
6676877
Link To Document