DocumentCode :
690532
Title :
Similarities and Dissimilarities between Character Frequencies of Written Text of Melayu, English, and Indonesian Languages
Author :
Shah, Aamer ; Saidin, Aznan Zuhid ; Taha, Imad Fakhri ; Zeki, Akram M. ; Bhatti, Zeeshan
Author_Institution :
Dept. of Comput. Sci., Int. Islamic Univ. Malaysia, Kuala Lumpur, Malaysia
fYear :
2013
fDate :
23-24 Dec. 2013
Firstpage :
192
Lastpage :
194
Abstract :
This research paper present some statistical similarities and dissimilarities between the character frequencies of three languages, Malay, Indonesia and English. The reason for their comparison is that they all share a common symbol set (A-Z). It has been found, through investigations that statistically Malay and Indonesian language character frequencies are very close to each other. For example, character "A" "N" and "E" in both Malay and Indonesian languages have frequencies (19%, 20.4%), (10%, 9.33%) and (9%, 8.28%), respectively. However, the case of English is different, where characters "E", "T" and "A" come with three highest frequency occurring letters, respectively. An interesting observation is that in spite of some similarities and dissimilarities between the characters, all three language follow envelop of the frequencies identically rising and falling trend for all characters. Moreover, for all three languages, last four characters, "W, X, Y, Z", also exhibit lowest usage in their respective languages.
Keywords :
natural language processing; statistical analysis; text analysis; English language; Indonesian language; Melayu language; character frequencies; statistical dissimilarities; statistical similarities; written text; Computer science; Educational institutions; Information systems; Internet; Market research; Probability; Time-frequency analysis; Character Frequency; Indonesian; Malayu;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computer Science Applications and Technologies (ACSAT), 2013 International Conference on
Conference_Location :
Kuching
Type :
conf
DOI :
10.1109/ACSAT.2013.45
Filename :
6836574
Link To Document :
بازگشت