DocumentCode :
3305457
Title :
Contextual postprocessing of a Korean OCR system by linguistic constraints
Author :
Kwon, Hyuk-Chul ; Hwang, Ho-Jeong ; Kim, Min-Jung ; Lee, Seong-Whan
Author_Institution :
Dept. of Comput. Sci., Pusan Nat. Univ., South Korea
Volume :
2
fYear :
1995
fDate :
14-16 Aug 1995
Firstpage :
557
Abstract :
The approach in this paper focuses on the contextual postprocessing by selecting the most feasible word from multiple output strings of an OCR system. The correction is applied only when the selection fails. The selected word is confirmed by the collocation between the word and the adjacent words. The five functions applied in the system are (1) to select a word from candidate words, (2) to correct candidate words using a confusion matrix of syllables, (3) to combine two substrings to a word that spans two lines, (4) to guess unknown nouns, and (5) to confirm a selected word by the contextual information of adjacent words. To improve speed, we use syllable di-grams and viable-prefixes of Korean words. The experimental result shows that the two heuristics speed up the system more than 1,000 times in worst case. Our system improves the word recognition rate of the OCR system from 90.50% to 94.72%
Keywords :
computational linguistics; natural languages; optical character recognition; Korean OCR; Korean words; collocation; contextual postprocessing; linguistic constraints; multiple output strings; syllable di-grams; viable-prefixes; word recognition; Character recognition; Computer science; Humans; Information analysis; Information filtering; Information filters; Natural languages; Optical character recognition software; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
Type :
conf
DOI :
10.1109/ICDAR.1995.601958
Filename :
601958
Link To Document :
بازگشت