Influence of language models and candidate set size on contextual post-processing for Chinese script recognition

Author

Li, Yuan-Xiang ; Tan, Chew Lim

Author_Institution

Sch. of Comput., Nat. Univ. of Singapore, Singapore

Volume

2

fYear

2004

fDate

23-26 Aug. 2004

Firstpage

537

Abstract

In the Chinese language, a word consisting of one or more characters is a basic syntax-meaningful unit, however, each character in the word also has a definite meaning in itself. We compare the perplexities of four n-gram language models (character-based bigram, character-based trigram, word-based bigram and class-based bigram) and their influence on the performance of contextual post-processing of Chinese scripts in an offline handwritten Chinese character recognition system. We also demonstrate the influence of the candidate set size on the performance of contextual post-processing in detail, and indicate that the number of candidates should vary with each script.

Keywords

handwritten character recognition; natural languages; text analysis; word processing; Chinese script recognition; basic syntax-meaningful unit; candidate set size; character-based bigram; character-based trigram; class-based bigram; contextual post-processing; language models; offline handwritten Chinese character recognition system; word-based bigram; Character recognition; Computational modeling; Context modeling; Handwriting recognition; Image recognition; Natural languages; Pattern recognition; Probability; Shape; Writing;

fLanguage

English

Publisher

ieee

Conference_Titel

Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on

ISSN

1051-4651

Print_ISBN

0-7695-2128-2

Type

conf

DOI

10.1109/ICPR.2004.1334295

Filename

1334295