• DocumentCode
    2443056
  • Title

    Detecting similar software applications

  • Author

    McMillan, Collin ; Grechanik, Mark ; Poshyvanyk, Denys

  • Author_Institution
    Coll. of William & Mary, Williamsburg, VA, USA
  • fYear
    2012
  • fDate
    2-9 June 2012
  • Firstpage
    364
  • Lastpage
    374
  • Abstract
    Although popular text search engines allow users to retrieve similar web pages, source code search engines do not have this feature. Detecting similar applications is a notoriously difficult problem, since it implies that similar highlevel requirements and their low-level implementations can be detected and matched automatically for different applications. We created a novel approach for automatically detecting Closely reLated ApplicatioNs (CLAN) that helps users detect similar applications for a given Java application. Our main contributions are an extension to a framework of relevance and a novel algorithm that computes a similarity index between Java applications using the notion of semantic layers that correspond to packages and class hierarchies. We have built CLAN and we conducted an experiment with 33 participants to evaluate CLAN and compare it with the closest competitive approach, MUDABlue. The results show with strong statistical significance that CLAN automatically detects similar applications from a large repository of 8,310 Java applications with a higher precision than MUDABlue.
  • Keywords
    Java; software engineering; statistical analysis; CLAN; Java application; MUDABlue; Web page retrieval; class hierarchy; closely related application detection; packages hierarchy; semantic layers; similar software application detection; source code search engines; text search engines; Java; Large scale integration; Search engines; Semantics; Software; Time division multiplexing; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering (ICSE), 2012 34th International Conference on
  • Conference_Location
    Zurich
  • ISSN
    0270-5257
  • Print_ISBN
    978-1-4673-1066-6
  • Electronic_ISBN
    0270-5257
  • Type

    conf

  • DOI
    10.1109/ICSE.2012.6227178
  • Filename
    6227178