• DocumentCode
    650686
  • Title

    Content Categorization of API Discussions

  • Author

    Daqing Hou ; Lingfeng Mo

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Clarkson Univ., Potsdam, NY, USA
  • fYear
    2013
  • fDate
    22-28 Sept. 2013
  • Firstpage
    60
  • Lastpage
    69
  • Abstract
    Text categorization, automatically labeling natural language text with pre-defined semantic categories, is an essential task for managing the abundant online data. An example of such data in Software Engineering is the large, ever-growing volume of forum discussions on how to use particular APIs. We have conducted a study to explore the question as to how well machine learning algorithms can be applied to categorize API discussions based on their content. Our goal is two-fold: (1) Can a relatively straightforward algorithm such as Naive Bayes work sufficiently well for this task? (2) If yes, how can we control its performance? We have achieved the best test accuracy mean (TAM) of 94.1% with our largest training data set for the AWT/Swing API, which consists of 833 forum discussions distributed over eight categories/topics. We have also investigated factors that impact classification accuracy, with the most important two being the size of the training set and multi-label documents (the phenomenon that some discussions involve more than one category).
  • Keywords
    application program interfaces; learning (artificial intelligence); text analysis; API discussions; AWT-Swing API; Naive Bayes; TAM; automatic natural language text labeling; content categorization; machine learning algorithms; multilabel documents; online data management; pre-defined semantic category; software engineering; test accuracy mean; text categorization; training data set; Accuracy; Machine learning algorithms; Mathematical model; Message systems; Software; Training; Training data; APIs; AWT/Swing; MALLET; Naive Bayes; Online Forums; Text Categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Maintenance (ICSM), 2013 29th IEEE International Conference on
  • Conference_Location
    Eindhoven
  • ISSN
    1063-6773
  • Type

    conf

  • DOI
    10.1109/ICSM.2013.17
  • Filename
    6676877