DocumentCode
670221
Title
Selective chunking — Easy and effective way to estimate text similarity
Author
Kucecka, Tomas ; Chuda, Daniela ; Samuhel, Patrik
Author_Institution
Fac. of Inf. & Inf. Technol., Slovak Univ. of Technol., Bratislava, Slovakia
fYear
2013
fDate
19-21 Nov. 2013
Firstpage
381
Lastpage
385
Abstract
Plagiarism is a serious problem especially in academic environment. Basically we define this problem as a theft of stealing somebody else´s work or ideas. In this paper we focus on plagiarism in a domain of student assignments written in natural language. We propose an approach that should faster and better identify copied fragments of text data than standard approaches. We first identify topic related pairs of text documents and then select those pairs on further processing that discuss similar topic. We experimented with usage of different chunking methods in the comparison process to overcome typical problems as shorter fragments of text copied from other documents. The results show that our approach is more suitable for plagiarism detection as a standard n-gram method.
Keywords
educational administrative data processing; natural language processing; text analysis; natural language; plagiarism detection; selective chunking; similar topic; standard n-gram method; student assignments; text documents; text similarity estimation; Informatics; Plagiarism; Standards; Time complexity; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Informatics (CINTI), 2013 IEEE 14th International Symposium on
Conference_Location
Budapest
Print_ISBN
978-1-4799-0194-4
Type
conf
DOI
10.1109/CINTI.2013.6705226
Filename
6705226
Link To Document