DocumentCode
3727286
Title
Document verification using n-grams and histograms of words
Author
Abdulwahed Almarimi;Gabriela Andrejkov?;Peter Sedm?k
Author_Institution
Faculty of Science, P. J. ?af?rik University in Ko?ice, Institute of Computer Science
fYear
2015
Firstpage
21
Lastpage
26
Abstract
In the paper, there are analyzed and compared results of usable methods for a document verification based on n-grams and on local histograms build for its words for English and Arabic language. English and Arabic texts were analyzed from many statistical characteristics point of view. There were discovered some statistical differences between both languages and applied n-gram analysis and local histograms for discovering of text parts dissimilarities. The results for each text can show dissimilarities and call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments.
Keywords
"Histograms","Vocabulary","Standards","Computer science","Electronic mail","Plagiarism","Statistical analysis"
Publisher
ieee
Conference_Titel
Scientific Conference on Informatics, 2015 IEEE 13th International
Print_ISBN
978-1-4673-9867-1
Type
conf
DOI
10.1109/Informatics.2015.7377801
Filename
7377801
Link To Document