DocumentCode :
2343570
Title :
A Probabilistic Approach to Source Code Authorship Identification
Author :
Kothari, Jay ; Shevertalov, Maxim ; Stehle, Edward ; Mancoridis, Spiros
Author_Institution :
Dept. of Comput. Sci., Drexel Univ., Philadelphia, PA
fYear :
2007
fDate :
2-4 April 2007
Firstpage :
243
Lastpage :
248
Abstract :
There exists a need for tools to help identify the authorship of source code. This includes situations in which the ownership of code is questionable, such as in plagiarism or intellectual property infringement disputes. Authorship identification can also be used to assist in the apprehension of the creators of malware. In this paper we present an approach to identifying the authors of source code. We begin by computing a set of metrics to build profiles for a population of known authors using code samples that are verified to be authentic. We then compute metrics on unidentified source code to determine the closest matching profile. We demonstrate our approach on a case study that involves two kinds of software: one based on open source developers working on various projects, and another based on students working on assignments with the same requirements. In our case study we are able to determine authorship with greater than 70% accuracy in choosing the single nearest match and greater than 90% accuracy in choosing the top three ordered nearest matches
Keywords :
authorisation; computer viruses; intellectual property; malware; probabilistic approach; source code authorship identification; Computer science; Databases; Filtering; Guidelines; Intellectual property; Law; Legal factors; Open source software; Pattern matching; Plagiarism;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology, 2007. ITNG '07. Fourth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-2776-0
Type :
conf
DOI :
10.1109/ITNG.2007.17
Filename :
4151691
Link To Document :
بازگشت