• DocumentCode
    2328758
  • Title

    Internet-scale Real-time Code Clone Search Via Multi-level Indexing

  • Author

    Keivanloo, Iman ; Rilling, Juergen ; Charland, Philippe

  • Author_Institution
    Dept. of Comput. Sci. & Software Eng., Concordia Univ., Montreal, QC, Canada
  • fYear
    2011
  • fDate
    17-20 Oct. 2011
  • Firstpage
    23
  • Lastpage
    27
  • Abstract
    Finding lines of code similar to a code fragment across large knowledge bases in fractions of a second is a new branch of code clone research also known as real-time code clone search. Among the requirements real-time code clone search has to meet are scalability, short response time, scalable incremental corpus updates, and support for type-1, type-2, and type-3 clones. We conducted a set of empirical studies on a large open source code corpus to gain insight about its characteristics. We used these results to design and optimize a multi-level indexing approach using hash table-based and binary search to improve Internet-scale real-time code clone search response time. Finally, we performed an evaluation on an Internet-scale corpus (1.5 million Java files and 266 MLOC). Our approach maintains a response time for 99.9% of clone searches in the microseconds range, while supporting the aforementioned requirements.
  • Keywords
    Internet; data structures; indexing; public domain software; search problems; software engineering; Internet scale real time code clone search; binary search; code fragment; hash table; multilevel indexing; open source code corpus; scalable incremental corpus updates; Cloning; Complexity theory; Indexing; Real time systems; Software engineering; Time factors; Code clone; Internet-scale code search; code clone detection; code clone search; real-time search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reverse Engineering (WCRE), 2011 18th Working Conference on
  • Conference_Location
    Limerick
  • ISSN
    1095-1350
  • Print_ISBN
    978-1-4577-1948-6
  • Type

    conf

  • DOI
    10.1109/WCRE.2011.13
  • Filename
    6079771