DocumentCode :
311134
Title :
A language model based on semantically clustered words in a Chinese character recognition system
Author :
Lee, Hsi-Jian ; Tung, Cheng-Huang
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
Volume :
1
fYear :
1995
fDate :
14-16 Aug 1995
Firstpage :
450
Abstract :
This paper presents a new method for clustering the words in a dictionary into word groups, which are applied in a Chinese character recognition system with a language model to describe the contextual information. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to train the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the behavior dictionary, which has a rather complete word set. Then, the updated word classes are clustered into m groups according to the semantic measurement by a greedy method. The words in the behavior dictionary can finally be assigned into the m groups. The parameter space for bigram contextual information of the character recognition system is m2. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model
Keywords :
character recognition; computational linguistics; Chinese character recognition; Chinese synonym dictionary; Tong2yi4ci2 ci2lin2; behavior dictionary; character recognition system; language model; semantic attributes; semantically clustered words; Character recognition; Computer science; Context modeling; Dictionaries; Error correction; Natural languages; Postal services; Random access memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
Type :
conf
DOI :
10.1109/ICDAR.1995.599033
Filename :
599033
Link To Document :
بازگشت