• DocumentCode
    693241
  • Title

    Combination of VSM and Jaccard coefficient for external plagiarism detection

  • Author

    Shuai Wang ; Haoliang Qi ; Leilei Kong ; Cuixia Nu

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Heilongjiang Inst. of Technol., Harbin, China
  • Volume
    04
  • fYear
    2013
  • fDate
    14-17 July 2013
  • Firstpage
    1880
  • Lastpage
    1885
  • Abstract
    Detailed comparison is one important sub-task of external plagiarism detection. Seed heuristic between two documents is often used in this task. Vector space model (VSM) and Jaccard coefficient are commonly used in plagiarism detection. VSM can produce high recall performance; Jaccard coefficient can produce high precision performance. In this paper, we propose a hybrid similarity measure model on the basis of the fitting function of the optimal dividing line between plagiarism and none-plagiarism where we integrates VSM and Jaccard coefficient into a unified one, our method make full use of the advantage of VSM and the Jaccard coefficient, and it can extract more reasonable heuristic seeds in the plagiarism detection. Our method is evaluated at PAN corpus of CLEF (Cross-Language Evaluation Forum) and compared with the methods based on VSM or Jaccard coefficient. Experimental results show our method can produce better performance.
  • Keywords
    document handling; natural language processing; security of data; vectors; CLEF; Jaccard coefficient; VSM; cross-language evaluation forum; external plagiarism detection; fitting function; hybrid similarity measure model; vector space model; Abstracts; Plagiarism; Tin; Detailed comparison; External plagiarism detection; Fitting function; Jaccard coefficient; Seed heuristic; Vector space model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
  • Conference_Location
    Tianjin
  • Type

    conf

  • DOI
    10.1109/ICMLC.2013.6890902
  • Filename
    6890902