• DocumentCode
    1381353
  • Title

    An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis

  • Author

    Cosma, G. ; Joy, M.

  • Author_Institution
    P.A. Coll., Larnaca, Cyprus
  • Volume
    61
  • Issue
    3
  • fYear
    2012
  • fDate
    3/1/2012 12:00:00 AM
  • Firstpage
    379
  • Lastpage
    394
  • Abstract
    Plagiarism is a growing problem in academia. Academics often use plagiarism detection tools to detect similar source-code files. Once similar files are detected, the academic proceeds with the investigation process which involves identifying the similar source-code fragments within them that could be used as evidence for proving plagiarism. This paper describes PlaGate, a novel tool that can be integrated with existing plagiarism detection tools to improve plagiarism detection performance. The tool also implements a new approach for investigating the similarity between source-code files with a view to gathering evidence for proving plagiarism. Graphical evidence is presented that allows for the investigation of source-code fragments with regards to their contribution toward evidence for proving plagiarism. The graphical evidence indicates the relative importance of the given source-code fragments across files in a corpus. This is done by using the Latent Semantic Analysis information retrieval technique to detect how important they are within the specific files under investigation in relation to other files in the corpus.
  • Keywords
    file organisation; information retrieval; security of data; source coding; PlaGate; academia; academics; information retrieval; latent semantic analysis; source-code files; source-code plagiarism detection; Matrix decomposition; Plagiarism; Programming; Semantics; Software; Source-code similarity detection; latent semantic analysis.; similarity investigation tool;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2011.223
  • Filename
    6086533