• DocumentCode
    168299
  • Title

    Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space

  • Author

    Meuschke, Norman ; Gipp, Bela

  • Author_Institution
    Nat. Inst. of Inf., Tokyo, Japan
  • fYear
    2014
  • fDate
    8-12 Sept. 2014
  • Firstpage
    197
  • Lastpage
    200
  • Abstract
    This paper proposes a hybrid approach to plagiarism detection in academic documents that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms. Currently available software for plagiarism detection exclusively performs text string comparisons. These systems find copies, but fail to identify disguised plagiarism, such as paraphrases, translations, or idea plagiarism. Detection approaches that consider semantic similarity on word and sentence level exist and have consistently achieved higher detection accuracy for disguised plagiarism forms compared to character-based approaches. However, the high computational effort of these semantic approaches makes them infeasible for use in real-world plagiarism detection scenarios. The proposed hybrid approach uses citation-based methods as a preliminary heuristic to reduce the retrieval space with a relatively low loss in detection accuracy. This preliminary step can then be followed by a computationally more expensive semantic and character-based analysis. We show that such a hybrid approach allows semantic plagiarism detection to become feasible even on large collections for the first time.
  • Keywords
    citation analysis; information retrieval; semantic Web; academic documents; character-based analysis; character-based methods; citation characteristics; citation-based methods; plagiarism detection methods; real-world plagiarism detection scenarios; retrieval space; semantic argument structure; semantic word similarity; sentence level; Algorithm design and analysis; Citation analysis; Couplings; Handheld computers; Plagiarism; Semantics; Text analysis; Citation Analysis; Disguised Plagiarism; Information Retrieval; Large Scale Collections; Plagiarism Detection; Semantic Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/JCDL.2014.6970168
  • Filename
    6970168