DocumentCode :
650686
Title :
Content Categorization of API Discussions
Author :
Daqing Hou ; Lingfeng Mo
Author_Institution :
Dept. of Electr. & Comput. Eng., Clarkson Univ., Potsdam, NY, USA
fYear :
2013
fDate :
22-28 Sept. 2013
Firstpage :
60
Lastpage :
69
Abstract :
Text categorization, automatically labeling natural language text with pre-defined semantic categories, is an essential task for managing the abundant online data. An example of such data in Software Engineering is the large, ever-growing volume of forum discussions on how to use particular APIs. We have conducted a study to explore the question as to how well machine learning algorithms can be applied to categorize API discussions based on their content. Our goal is two-fold: (1) Can a relatively straightforward algorithm such as Naive Bayes work sufficiently well for this task? (2) If yes, how can we control its performance? We have achieved the best test accuracy mean (TAM) of 94.1% with our largest training data set for the AWT/Swing API, which consists of 833 forum discussions distributed over eight categories/topics. We have also investigated factors that impact classification accuracy, with the most important two being the size of the training set and multi-label documents (the phenomenon that some discussions involve more than one category).
Keywords :
application program interfaces; learning (artificial intelligence); text analysis; API discussions; AWT-Swing API; Naive Bayes; TAM; automatic natural language text labeling; content categorization; machine learning algorithms; multilabel documents; online data management; pre-defined semantic category; software engineering; test accuracy mean; text categorization; training data set; Accuracy; Machine learning algorithms; Mathematical model; Message systems; Software; Training; Training data; APIs; AWT/Swing; MALLET; Naive Bayes; Online Forums; Text Categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Maintenance (ICSM), 2013 29th IEEE International Conference on
Conference_Location :
Eindhoven
ISSN :
1063-6773
Type :
conf
DOI :
10.1109/ICSM.2013.17
Filename :
6676877
Link To Document :
بازگشت