• DocumentCode
    566590
  • Title

    A fingerprinting-based plagiarism detection system for Arabic text-based documents

  • Author

    Jadalla, Ameera ; Elnagar, Ashraf

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sharjah, Sharjah, United Arab Emirates
  • Volume
    1
  • fYear
    2012
  • fDate
    24-26 April 2012
  • Firstpage
    477
  • Lastpage
    482
  • Abstract
    This paper presents a novel plagiarism detection system for Arabic text-based documents, Iqtebas 1.0. This is a primary work dedicated for plagiarism of Arabic based documents. Arabic is a rich morphological language that is among the top used languages in the world and in the Internet as well. Given a document and a set of suspected files, our goal is to compute the originality value of the examined document. The originality value of a text is computed by computing the distance between each sentence in the text and the closest sentence in the suspected files, if exists. The proposed system structure is based on a search engine in order to reduce the cost of pairwise similarity. For the indexing process, we use the winnowing n-gram fingerprinting algorithm to reduce the index size. The fingerprints of each sentence are its n-grams that are represented by hash codes. The winnowing algorithm computes fingerprints for each sentence. As a result, the search time is improved and the detection process is accurate and robust. The experimental results showed superb performance of Iqtebas 1.0 as it achieved a recall value of 94% and a precision of 99%.
  • Keywords
    Internet; indexing; natural language processing; search engines; text analysis; Arabic text-based documents; Internet; Iqtebas 1.0; fingerprinting-based plagiarism detection system; indexing process; pairwise similarity; rich morphological language; search engine; winnowing n-gram fingerprinting algorithm; Educational institutions; Fingerprint recognition; Indexes; Internet; Plagiarism; Search engines; Vectors; Arabic; Plagiarism detection; fingerprinting techniques; text mining; text re-use;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing Technology and Information Management (ICCM), 2012 8th International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-0893-9
  • Type

    conf

  • Filename
    6268545