DocumentCode :
2629655
Title :
Postprocessing algorithm based on the probabilistic and semantic method for Japanese OCR
Author :
Konno, Akiko ; Hongo, Yasuo
Author_Institution :
Fuji Electric Corp. Res. & Dev., Ltd., Tokyo, Japan
fYear :
1993
fDate :
20-22 Oct 1993
Firstpage :
646
Lastpage :
649
Abstract :
A postprocessing algorithm for Japanese OCR based on the probabilistic and semantic method is described. It determines the reliability of each recognized character and word in OCR outputs with their recognition probability and appearance frequency in the text. Using these reliabilities, grammatical word paths are searched in each phrase. When it is necessary to select the most suitable word from a similar word set, an attempt is made to select a particular one by the semantic method with co-occurrence word dictionary. This method was applied to OCR outputs of current newspapers and technical documents including some unregistered words, and evaluated its performance. While error correction rate depends on the ratio of unregistered words in the texts, the error detection rate is almost 90%
Keywords :
glossaries; optical character recognition; probability; Japanese OCR; appearance frequency; co-occurrence word dictionary; current newspapers; error correction rate; error detection rate; grammatical word paths; postprocessing algorithm; probabilistic method; recognition probability; recognized character; reliability; semantic method; technical documents; text; unregistered words; Character recognition; Dictionaries; Error correction; Frequency; Laboratories; Natural languages; Optical character recognition software; Software algorithms; Software systems; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
Type :
conf
DOI :
10.1109/ICDAR.1993.395654
Filename :
395654
Link To Document :
بازگشت