• DocumentCode
    2343570
  • Title

    A Probabilistic Approach to Source Code Authorship Identification

  • Author

    Kothari, Jay ; Shevertalov, Maxim ; Stehle, Edward ; Mancoridis, Spiros

  • Author_Institution
    Dept. of Comput. Sci., Drexel Univ., Philadelphia, PA
  • fYear
    2007
  • fDate
    2-4 April 2007
  • Firstpage
    243
  • Lastpage
    248
  • Abstract
    There exists a need for tools to help identify the authorship of source code. This includes situations in which the ownership of code is questionable, such as in plagiarism or intellectual property infringement disputes. Authorship identification can also be used to assist in the apprehension of the creators of malware. In this paper we present an approach to identifying the authors of source code. We begin by computing a set of metrics to build profiles for a population of known authors using code samples that are verified to be authentic. We then compute metrics on unidentified source code to determine the closest matching profile. We demonstrate our approach on a case study that involves two kinds of software: one based on open source developers working on various projects, and another based on students working on assignments with the same requirements. In our case study we are able to determine authorship with greater than 70% accuracy in choosing the single nearest match and greater than 90% accuracy in choosing the top three ordered nearest matches
  • Keywords
    authorisation; computer viruses; intellectual property; malware; probabilistic approach; source code authorship identification; Computer science; Databases; Filtering; Guidelines; Intellectual property; Law; Legal factors; Open source software; Pattern matching; Plagiarism;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology, 2007. ITNG '07. Fourth International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    0-7695-2776-0
  • Type

    conf

  • DOI
    10.1109/ITNG.2007.17
  • Filename
    4151691