Title :
Word recognition in a segmentation-free approach to OCR
Author :
Chen, C.H. ; DeCurtins, J.L.
Author_Institution :
SRI Int., Menlo Park, CA, USA
Abstract :
Segmentation is a key step in current OCR systems. It has been estimated that half the errors in character recognition are due to segmentation. A novel approach that performs OCR without the segmentation step was developed. The approach starts by extracting significant geometric features from the input document image of the page. Each feature then votes for the character that could have generated that feature. Thus, even if some of the features are occluded or lost due to degradation, the remaining features can successfully identify the character. In extreme cases, the degradation may be severe enough to prevent recognition of some of the characters in a word. In such cases, a lexicon-based word recognition technique is used to resolve ambiguity. Inexact matching and probabilistic evaluation used in the technique make it possible to identify the correct word, by detecting a partial set of characters. The authors first present an overview of their segmentation-free OCR system and then focus on the word recognition technique. Preliminary experimental results show that this is a very promising approach
Keywords :
feature extraction; optical character recognition; word processing; OCR systems; character recognition; degradation; feature extraction; input document image; lexicon-based word recognition technique; probabilistic evaluation; segmentation-free approach; significant geometric features; Automation; Character generation; Character recognition; Degradation; Feature extraction; Hardware; Image recognition; Image segmentation; Optical character recognition software; Printing;
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
DOI :
10.1109/ICDAR.1993.395670