Author_Institution :
Comput. Sci., Univ. of Auckland, Auckland, New Zealand
Abstract :
In order to do research on code clones, it is necessary to have information about code clones. For example, if the research is to improve clone detection, this information would be used to validate the detectors or provide a benchmark to compare different detectors. Or if the research is on techniques for managing clones, then the information would be used as input to such techniques. Typically, researchers have to develop clone information themselves, even if doing so is not the main focus of their research. If such information could be made available, they would be able to use their time more efficiently. If such information was usefully organised and its quality clearly identified, that is, the information is curated, then the quality of the research would be improved as well. In this paper, I describe the beginnings of a curated source of information about a collection of code clones from the Qualitas Corpus. I describe how this information is currently organised, discuss how it might be used, and proposed directions it might take in the future. The collection currently includes 1.3M method-level clone-pairs from 109 different open source Java Systems, applying to approximately 5.6M lines of code.
Keywords :
Java; public domain software; software engineering; 1.3M method-level clone-pairs; Qualitas corpus; clone detection; clone management; code clones; curated information source; open source Java systems; software clones; Accuracy; Benchmark testing; Cloning; Detectors; Java; Software; Terminology; Code Analysis; Code Clones; Corpus; Empirical Studies;