Title :
Japanese document recognition based on interpolated n-gram model of character
Author :
Mori, Hiroki ; Aso, Hirotomo ; Makino, Shozo
Author_Institution :
Graduate Sch. of Eng., Tohoku Univ., Sendai, Japan
Abstract :
N-gram model is widely applied to various pattern recognition system because it well represents local features of natural languages. In this paper, we describe a contextual postprocessing method using a trigram model of character for Japanese document recognition, and its advantage is revealed by practical experiments. The model is automatically obtained by statistical processing of training documents. The ability to reduce ambiguity is evaluated by the perplexity. In the processing, two smoothing methods are examined, and the predictive power of the deleted interpolation method is shown to be superior. For leading articles, the perplexity reduced to about 22 when using deleted interpolation. The output from OCR is processed very fast using a Viterbi algorithm. Experimental results of recognition for three kinds of documents show that the error correction rates are ranged from 75 to over 90 percent
Keywords :
maximum likelihood estimation; optical character recognition; Japanese document recognition; Viterbi algorithm; contextual postprocessing method; deleted interpolation method; error correction rates; interpolated n-gram model; local feature; pattern recognition system; perplexity; smoothing methods; statistical processing; training documents; trigram model; Character recognition; Context modeling; Error correction; Interpolation; Natural languages; Optical character recognition software; Pattern recognition; Power system modeling; Smoothing methods; Viterbi algorithm;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.598993