Title :
Entropy rate of Thai text and testing author authenticity using character combination distribution
Author :
Kiatdarakun, Theerawat ; Suksompong, Prapun
Author_Institution :
Sch. of Inf., Comput. & Commun. Technol. (ICT), Thammasat Univ., Pathum Thani, Thailand
Abstract :
This paper has two main goals. The first goal is to estimate the entropy rate of Thai text which is found to be roughly 2 bits/character. The second goal is to come up with methods for text authentication based on probability distribution and information theoretic quantities. Using proposed methods, we found that digital books composed by the same author give close numerical values, while those from different authors give much higher differences. Among the three techniques under consideration, we found that the entropy-based method provides the best test. Thirty Thai text sources of various styles are tested to increase reliability of the study. Additionally, the comparison of the effectiveness of proposed methods is shown here.
Keywords :
authorisation; electronic publishing; entropy; natural languages; statistical distributions; text analysis; Thai text sources; author authenticity testing; character combination distribution; digital books; entropy rate estimation; entropy-based method; information theoretic quantities; numerical values; probability distribution-based text authentication; Authentication; Entropy; Estimation; Joints; Probability distribution; Testing; Writing;
Conference_Titel :
Digital Information and Communication Technology and it's Applications (DICTAP), 2012 Second International Conference on
Conference_Location :
Bangkok
Print_ISBN :
978-1-4673-0733-8
DOI :
10.1109/DICTAP.2012.6215415