DocumentCode :
297455
Title :
Entropies of Chinese texts based on three models of Hanyu Pinyin phonetic system
Author :
Huang, Shell Ying ; Ong, Ghim Hwee
Author_Institution :
Div. of Comput. Technol., Sch. of Appl. Sci., Nanyang Technol. Univ., Singapore
Volume :
1
fYear :
1993
fDate :
6-11 Sep 1993
Firstpage :
305
Abstract :
Entropy indicates the lower bound to the number of bits required to represent the information in the texts of a language. It is a function of the probability distribution of the language units. A set of language units with their probabilities is just a model of the texts. A different set of language units and probabilities provides a different model. This paper reports on the study of entropies of Chinese texts provided by three models based on the Chinese phonetic system, Hanyu Pinyin. These models yield higher values of entropies than the ideogram-based model. However, Chinese texts transcribed in Hanyu Pinyin are a simple way to do Chinese input and no translation is needed before storage in computer systems. In addition, the coding of frequency table in static and semi-adaptive text compression schemes is much smaller than that for ideograms. This is an important advantage for compression of small to medium sized text files
Keywords :
computational linguistics; entropy; speech processing; word processing; Chinese texts; Hanyu Pinyin; entropies; frequency table; phonetic system; text compression schemes; Computer science; Entropy; Frequency; Information systems; Natural languages; Probability distribution; Size measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networks, 1993. International Conference on Information Engineering '93. 'Communications and Networks for the Year 2000', Proceedings of IEEE Singapore International Conference on
Print_ISBN :
0-7803-1445-X
Type :
conf
DOI :
10.1109/SICON.1993.515776
Filename :
515776
Link To Document :
بازگشت