DocumentCode
2332976
Title
File cloning in open source Java projects: The good, the bad, and the ugly
Author
Ossher, Joel ; Sajnani, Hitesh ; Lopes, Cristina
Author_Institution
Donald Bren Sch. of Inf. & Comput. Sci., Univ. of California, Irvine, CA, USA
fYear
2011
fDate
25-30 Sept. 2011
Firstpage
283
Lastpage
292
Abstract
We present a study of the extent to which developers copy entire files or sets of files into their applications with little or no modification. Our aim is to determine the prevalence of such activity within open source Java development, and to identify the circumstances under which files are reused in this manner. To accomplish this aim, we developed a novel method of file-level code clone detection that is scalable to millions of files. We applied our method to the Sourcerer Repository, which contains over 13,000 Java projects aggregated from multiple open source repositories. Our method detected that in excess of 10% of files are clones, and that over 15% of all projects contain at least one cloned file. In addition to computing these raw numbers, we manually examined a large number of the reported clones. We found the most commonly cloned files to be Java extension classes and popular third-party libraries, both large and small. We also discovered a number of projects that occur in multiple online repositories, have been forked, or were divided into multiple subprojects.
Keywords
Java; data flow analysis; public domain software; software libraries; software reusability; Java extension classes; Sourcerer Repository; file cloning; file-level code clone detection; multiple online repositories; multiple open source repositories; open source Java development; open source Java projects; popular third-party libraries; Cloning; Google; Indexes; Java; Libraries; Linux; Software;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Maintenance (ICSM), 2011 27th IEEE International Conference on
Conference_Location
Williamsburg, VI
ISSN
1063-6773
Print_ISBN
978-1-4577-0663-9
Electronic_ISBN
1063-6773
Type
conf
DOI
10.1109/ICSM.2011.6080795
Filename
6080795
Link To Document