• DocumentCode
    77634
  • Title

    The Impact of Classifier Configuration and Classifier Combination on Bug Localization

  • Author

    Thomas, Stephen W. ; Nagappan, Meiyappan ; Blostein, Dorothea ; Hassan, Ahmed E.

  • Author_Institution
    Sch. of Comput., Queen´s Univ., Kingston, ON, Canada
  • Volume
    39
  • Issue
    10
  • fYear
    2013
  • fDate
    Oct. 2013
  • Firstpage
    1427
  • Lastpage
    1443
  • Abstract
    Bug localization is the task of determining which source code entities are relevant to a bug report. Manual bug localization is labor intensive since developers must consider thousands of source code entities. Current research builds bug localization classifiers, based on information retrieval models, to locate entities that are textually similar to the bug report. Current research, however, does not consider the effect of classifier configuration, i.e., all the parameter values that specify the behavior of a classifier. As such, the effect of each parameter or which parameter values lead to the best performance is unknown. In this paper, we empirically investigate the effectiveness of a large space of classifier configurations, 3,172 in total. Further, we introduce a framework for combining the results of multiple classifier configurations since classifier combination has shown promise in other domains. Through a detailed case study on over 8,000 bug reports from three large-scale projects, we make two main contributions. First, we show that the parameters of a classifier have a significant impact on its performance. Second, we show that combining multiple classifiers--whether those classifiers are hand-picked or randomly chosen relative to intelligently defined subspaces of classifiers--improves the performance of even the best individual classifiers.
  • Keywords
    information retrieval; pattern classification; program debugging; bug localization classifiers; bug report; classifier combination; classifier configuration; classifier parameter; information retrieval models; parameter value; source code entity determination; Indexes; Information retrieval; Large scale integration; Matrix decomposition; Measurement; Resource management; Vectors; LDA; LSI; Software maintenance; VSM; bug localization; classifier combination; information retrieval;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2013.27
  • Filename
    6520844