DocumentCode
311110
Title
Japanese document recognition based on interpolated n-gram model of character
Author
Mori, Hiroki ; Aso, Hirotomo ; Makino, Shozo
Author_Institution
Graduate Sch. of Eng., Tohoku Univ., Sendai, Japan
Volume
1
fYear
1995
fDate
14-16 Aug 1995
Firstpage
274
Abstract
N-gram model is widely applied to various pattern recognition system because it well represents local features of natural languages. In this paper, we describe a contextual postprocessing method using a trigram model of character for Japanese document recognition, and its advantage is revealed by practical experiments. The model is automatically obtained by statistical processing of training documents. The ability to reduce ambiguity is evaluated by the perplexity. In the processing, two smoothing methods are examined, and the predictive power of the deleted interpolation method is shown to be superior. For leading articles, the perplexity reduced to about 22 when using deleted interpolation. The output from OCR is processed very fast using a Viterbi algorithm. Experimental results of recognition for three kinds of documents show that the error correction rates are ranged from 75 to over 90 percent
Keywords
maximum likelihood estimation; optical character recognition; Japanese document recognition; Viterbi algorithm; contextual postprocessing method; deleted interpolation method; error correction rates; interpolated n-gram model; local feature; pattern recognition system; perplexity; smoothing methods; statistical processing; training documents; trigram model; Character recognition; Context modeling; Error correction; Interpolation; Natural languages; Optical character recognition software; Pattern recognition; Power system modeling; Smoothing methods; Viterbi algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location
Montreal, Que.
Print_ISBN
0-8186-7128-9
Type
conf
DOI
10.1109/ICDAR.1995.598993
Filename
598993
Link To Document