DocumentCode :
693241
Title :
Combination of VSM and Jaccard coefficient for external plagiarism detection
Author :
Shuai Wang ; Haoliang Qi ; Leilei Kong ; Cuixia Nu
Author_Institution :
Sch. of Comput. Sci. & Technol., Heilongjiang Inst. of Technol., Harbin, China
Volume :
04
fYear :
2013
fDate :
14-17 July 2013
Firstpage :
1880
Lastpage :
1885
Abstract :
Detailed comparison is one important sub-task of external plagiarism detection. Seed heuristic between two documents is often used in this task. Vector space model (VSM) and Jaccard coefficient are commonly used in plagiarism detection. VSM can produce high recall performance; Jaccard coefficient can produce high precision performance. In this paper, we propose a hybrid similarity measure model on the basis of the fitting function of the optimal dividing line between plagiarism and none-plagiarism where we integrates VSM and Jaccard coefficient into a unified one, our method make full use of the advantage of VSM and the Jaccard coefficient, and it can extract more reasonable heuristic seeds in the plagiarism detection. Our method is evaluated at PAN corpus of CLEF (Cross-Language Evaluation Forum) and compared with the methods based on VSM or Jaccard coefficient. Experimental results show our method can produce better performance.
Keywords :
document handling; natural language processing; security of data; vectors; CLEF; Jaccard coefficient; VSM; cross-language evaluation forum; external plagiarism detection; fitting function; hybrid similarity measure model; vector space model; Abstracts; Plagiarism; Tin; Detailed comparison; External plagiarism detection; Fitting function; Jaccard coefficient; Seed heuristic; Vector space model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
Conference_Location :
Tianjin
Type :
conf
DOI :
10.1109/ICMLC.2013.6890902
Filename :
6890902
Link To Document :
بازگشت