• DocumentCode
    2332976
  • Title

    File cloning in open source Java projects: The good, the bad, and the ugly

  • Author

    Ossher, Joel ; Sajnani, Hitesh ; Lopes, Cristina

  • Author_Institution
    Donald Bren Sch. of Inf. & Comput. Sci., Univ. of California, Irvine, CA, USA
  • fYear
    2011
  • fDate
    25-30 Sept. 2011
  • Firstpage
    283
  • Lastpage
    292
  • Abstract
    We present a study of the extent to which developers copy entire files or sets of files into their applications with little or no modification. Our aim is to determine the prevalence of such activity within open source Java development, and to identify the circumstances under which files are reused in this manner. To accomplish this aim, we developed a novel method of file-level code clone detection that is scalable to millions of files. We applied our method to the Sourcerer Repository, which contains over 13,000 Java projects aggregated from multiple open source repositories. Our method detected that in excess of 10% of files are clones, and that over 15% of all projects contain at least one cloned file. In addition to computing these raw numbers, we manually examined a large number of the reported clones. We found the most commonly cloned files to be Java extension classes and popular third-party libraries, both large and small. We also discovered a number of projects that occur in multiple online repositories, have been forked, or were divided into multiple subprojects.
  • Keywords
    Java; data flow analysis; public domain software; software libraries; software reusability; Java extension classes; Sourcerer Repository; file cloning; file-level code clone detection; multiple online repositories; multiple open source repositories; open source Java development; open source Java projects; popular third-party libraries; Cloning; Google; Indexes; Java; Libraries; Linux; Software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Maintenance (ICSM), 2011 27th IEEE International Conference on
  • Conference_Location
    Williamsburg, VI
  • ISSN
    1063-6773
  • Print_ISBN
    978-1-4577-0663-9
  • Electronic_ISBN
    1063-6773
  • Type

    conf

  • DOI
    10.1109/ICSM.2011.6080795
  • Filename
    6080795