• DocumentCode
    1300787
  • Title

    Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications

  • Author

    McMillan, Collin ; Grechanik, Mark ; Poshyvanyk, Denys ; Fu, Chen ; Xie, Qing

  • Author_Institution
    Dept. of Comput. Sci., Coll. of William & Mary, Williamsburg, VA, USA
  • Volume
    38
  • Issue
    5
  • fYear
    2012
  • Firstpage
    1069
  • Lastpage
    1087
  • Abstract
    A fundamental problem of finding software applications that are highly relevant to development tasks is the mismatch between the high-level intent reflected in the descriptions of these tasks and low-level implementation details of applications. To reduce this mismatch we created an approach called EXEcutable exaMPLes ARchive (Exemplar) for finding highly relevant software projects from large archives of applications. After a programmer enters a natural-language query that contains high-level concepts (e.g., MIME, datasets), Exemplar retrieves applications that implement these concepts. Exemplar ranks applications in three ways. First, we consider the descriptions of applications. Second, we examine the Application Programming Interface (API) calls used by applications. Third, we analyze the dataflow among those API calls. We performed two case studies (with professional and student developers) to evaluate how these three rankings contribute to the quality of the search results from Exemplar. The results of our studies show that the combined ranking of application descriptions and API documents yields the most-relevant search results. We released Exemplar and our case study data to the public.
  • Keywords
    application program interfaces; data flow analysis; document handling; natural language processing; project management; query processing; software management; software reusability; system documentation; API call; API document; Exemplar; application description ranking; application programming interface; dataflow; development task; executable examples archive; natural-language query; search quality; software application; software project; software reuse; source code search engine; Cryptography; Data mining; Engines; Java; Search engines; Software; Vocabulary; Source code search engines; concept location; information retrieval; mining software repositories; open source software; software reuse;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2011.84
  • Filename
    5989838