مرکز منطقه ای اطلاع رساني علوم و فناوري - Japanese document recognition based on interpolated n-gram model of character

DocumentCode :

311110

Title :

Japanese document recognition based on interpolated n-gram model of character

Author :

Mori, Hiroki ; Aso, Hirotomo ; Makino, Shozo

Author_Institution :

Graduate Sch. of Eng., Tohoku Univ., Sendai, Japan

Volume :

fYear :

1995

fDate :

14-16 Aug 1995

Firstpage :

274

Abstract :

N-gram model is widely applied to various pattern recognition system because it well represents local features of natural languages. In this paper, we describe a contextual postprocessing method using a trigram model of character for Japanese document recognition, and its advantage is revealed by practical experiments. The model is automatically obtained by statistical processing of training documents. The ability to reduce ambiguity is evaluated by the perplexity. In the processing, two smoothing methods are examined, and the predictive power of the deleted interpolation method is shown to be superior. For leading articles, the perplexity reduced to about 22 when using deleted interpolation. The output from OCR is processed very fast using a Viterbi algorithm. Experimental results of recognition for three kinds of documents show that the error correction rates are ranged from 75 to over 90 percent

Keywords :

maximum likelihood estimation; optical character recognition; Japanese document recognition; Viterbi algorithm; contextual postprocessing method; deleted interpolation method; error correction rates; interpolated n-gram model; local feature; pattern recognition system; perplexity; smoothing methods; statistical processing; training documents; trigram model; Character recognition; Context modeling; Error correction; Interpolation; Natural languages; Optical character recognition software; Pattern recognition; Power system modeling; Smoothing methods; Viterbi algorithm;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on

Conference_Location :

Montreal, Que.

Print_ISBN :

0-8186-7128-9

Type :

conf

DOI :

10.1109/ICDAR.1995.598993

Filename :

598993

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=311110