Title :
On the Use of Discretized Source Code Metrics for Author Identification
Author :
Shevertalov, Maxim ; Kothari, Jay ; Stehle, Edward ; Mancoridis, Spiros
Author_Institution :
Dept. of Comput. Sci., Drexel Univ., Philadelphia, PA
Abstract :
Intellectual property infringement and plagiarism litigation involving source code would be more easily resolved using code authorship identification tools. Previous efforts in this area have demonstrated the potential of determining the authorship of a disputed piece of source code automatically. This was achieved by using source code metrics to build a database of developer profiles, thus characterizing a population of developers. These profiles were then used to determine the likelihood that the unidentified source code was authored by a given developer. In this paper we evaluate the effect of discretizing source code metrics for use in building developer profiles. It is well known that machine learning techniques perform better when using categorical variables as opposed to continuous ones. We present a genetic algorithm to discretize metrics to improve source code to author classification. We evaluate the approach with a case study involving 20 open source developers and over 750,000 lines of Java source code.
Keywords :
genetic algorithms; industrial property; learning (artificial intelligence); pattern classification; software metrics; Java source code; author classification; author identification; categorical variables; code authorship identification tools; developer profiles; discretized source code metrics; genetic algorithm; intellectual property infringement; machine learning techniques; plagiarism litigation; Computer languages; Databases; Genetic algorithms; Histograms; Java; Machine learning; Machine learning algorithms; Plagiarism; Software engineering; Statistical analysis; authorship identification; forensic analysis; plagiarism; search based software engineering; software engineering; software forensics; source code authorship;
Conference_Titel :
Search Based Software Engineering, 2009 1st International Symposium on
Conference_Location :
Windsor
Print_ISBN :
978-0-7695-3675-0
DOI :
10.1109/SSBSE.2009.18