• DocumentCode
    670221
  • Title

    Selective chunking — Easy and effective way to estimate text similarity

  • Author

    Kucecka, Tomas ; Chuda, Daniela ; Samuhel, Patrik

  • Author_Institution
    Fac. of Inf. & Inf. Technol., Slovak Univ. of Technol., Bratislava, Slovakia
  • fYear
    2013
  • fDate
    19-21 Nov. 2013
  • Firstpage
    381
  • Lastpage
    385
  • Abstract
    Plagiarism is a serious problem especially in academic environment. Basically we define this problem as a theft of stealing somebody else´s work or ideas. In this paper we focus on plagiarism in a domain of student assignments written in natural language. We propose an approach that should faster and better identify copied fragments of text data than standard approaches. We first identify topic related pairs of text documents and then select those pairs on further processing that discuss similar topic. We experimented with usage of different chunking methods in the comparison process to overcome typical problems as shorter fragments of text copied from other documents. The results show that our approach is more suitable for plagiarism detection as a standard n-gram method.
  • Keywords
    educational administrative data processing; natural language processing; text analysis; natural language; plagiarism detection; selective chunking; similar topic; standard n-gram method; student assignments; text documents; text similarity estimation; Informatics; Plagiarism; Standards; Time complexity; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Informatics (CINTI), 2013 IEEE 14th International Symposium on
  • Conference_Location
    Budapest
  • Print_ISBN
    978-1-4799-0194-4
  • Type

    conf

  • DOI
    10.1109/CINTI.2013.6705226
  • Filename
    6705226