• DocumentCode
    644079
  • Title

    A Replicated Comparative Study of Source Code Authorship Attribution

  • Author

    Tennyson, Matthew F.

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Syst., Bradley Univ., Peoria, IL, USA
  • fYear
    2013
  • fDate
    9-9 Oct. 2013
  • Firstpage
    76
  • Lastpage
    83
  • Abstract
    Source code authorship attribution is, simply, the task of deciding who wrote a piece of software given its source code. Applications include software forensics, plagiarism detection, and determining software ownership. Several methods of source code authorship attribution have been proposed in the past. Based on the only known controlled, comprehensive comparative study of these methods, the two most effective methods are the Burrows method and the SCAP method. This paper presents a partial replication of that comparative study. Specifically, it only compares the two most effective methods (Burrows and SCAP). This paper also includes a slight extension of that study: the original comparative study only considered anonymized data, while the replicated study considers both anonymized and non-anonymized data. The original comparative study indicated that the Burrows method outperformed all other methods - including the SCAP method - by a considerable margin. However, the results of the replicated study indicate that the SCAP method outperforms the Burrows method by a small margin when using anonymized data and by a large margin when using non-anonymized data.
  • Keywords
    authoring systems; digital forensics; Burrows method; SCAP method; anonymized data; nonanonymized data; partial replicated comparative study; plagiarism detection; software forensics; software ownership; source code author profile; source code authorship attribution; Forensics; Java; Open source software; Plagiarism; Programming; authorship attribution; information retrieval; plagiarism detection; software forensics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Replication in Empirical Software Engineering Research (RESER), 2013 3rd International Workshop on
  • Conference_Location
    Baltimore, MD
  • Type

    conf

  • DOI
    10.1109/RESER.2013.12
  • Filename
    6664734