• DocumentCode
    3727286
  • Title

    Document verification using n-grams and histograms of words

  • Author

    Abdulwahed Almarimi;Gabriela Andrejkov?;Peter Sedm?k

  • Author_Institution
    Faculty of Science, P. J. ?af?rik University in Ko?ice, Institute of Computer Science
  • fYear
    2015
  • Firstpage
    21
  • Lastpage
    26
  • Abstract
    In the paper, there are analyzed and compared results of usable methods for a document verification based on n-grams and on local histograms build for its words for English and Arabic language. English and Arabic texts were analyzed from many statistical characteristics point of view. There were discovered some statistical differences between both languages and applied n-gram analysis and local histograms for discovering of text parts dissimilarities. The results for each text can show dissimilarities and call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments.
  • Keywords
    "Histograms","Vocabulary","Standards","Computer science","Electronic mail","Plagiarism","Statistical analysis"
  • Publisher
    ieee
  • Conference_Titel
    Scientific Conference on Informatics, 2015 IEEE 13th International
  • Print_ISBN
    978-1-4673-9867-1
  • Type

    conf

  • DOI
    10.1109/Informatics.2015.7377801
  • Filename
    7377801