Title :
Influence of language models and candidate set size on contextual post-processing for Chinese script recognition
Author :
Li, Yuan-Xiang ; Tan, Chew Lim
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
Abstract :
In the Chinese language, a word consisting of one or more characters is a basic syntax-meaningful unit, however, each character in the word also has a definite meaning in itself. We compare the perplexities of four n-gram language models (character-based bigram, character-based trigram, word-based bigram and class-based bigram) and their influence on the performance of contextual post-processing of Chinese scripts in an offline handwritten Chinese character recognition system. We also demonstrate the influence of the candidate set size on the performance of contextual post-processing in detail, and indicate that the number of candidates should vary with each script.
Keywords :
handwritten character recognition; natural languages; text analysis; word processing; Chinese script recognition; basic syntax-meaningful unit; candidate set size; character-based bigram; character-based trigram; class-based bigram; contextual post-processing; language models; offline handwritten Chinese character recognition system; word-based bigram; Character recognition; Computational modeling; Context modeling; Handwriting recognition; Image recognition; Natural languages; Pattern recognition; Probability; Shape; Writing;
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
Print_ISBN :
0-7695-2128-2
DOI :
10.1109/ICPR.2004.1334295