DocumentCode :
2029463
Title :
An empirical study of statistical language models for contextual post-processing of Chinese script recognition
Author :
Li, Yuan-Xiang ; Tan, Chew Lim
Author_Institution :
Sch. of Comput., National Univ. of Singapore, Singapore
fYear :
2004
fDate :
26-29 Oct. 2004
Firstpage :
257
Lastpage :
262
Abstract :
It is crucial to use statistical language models (LM) to improve the accuracy of Chinese offline script recognition. In this paper, we investigate the influence of several LM on the contextual post-processing performance of Chinese script recognition. We first introduce seven LM, i.e., three conventional LM (character-based bigram, character-based trigram, word-based bigram), two class-based bigram LM and two hybrid bigram LM combining word-based bigrams and class-based bigrams. We then investigate how the LM perplexities are affected by training corpus size, smoothing methods and count cutoffs. Next, we demonstrate the above LM influence on the post-processing performance in terms of recognition accuracy, memory requirement and processing speed. Finally, we give a proposal to select a suitable LM in real recognition tasks.
Keywords :
character recognition; context-sensitive languages; natural languages; statistical analysis; Chinese script recognition; character-based bigram; character-based trigram; contextual post-processing; statistical language models; word-based bigram; Character recognition; Context modeling; Handwriting recognition; Image recognition; Natural languages; Pattern recognition; Proposals; Shape; Smoothing methods; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition, 2004. IWFHR-9 2004. Ninth International Workshop on
ISSN :
1550-5235
Print_ISBN :
0-7695-2187-8
Type :
conf
DOI :
10.1109/IWFHR.2004.15
Filename :
1363920
Link To Document :
بازگشت