Title :
Algorithms for postprocessing OCR results with visual inter-word constraints
Author :
Hong, Tao ; Hull, Jonathan J.
Author_Institution :
Center of Excellence for Document Anal. & Recognition, State Univ. of New York, Buffalo, NY, USA
Abstract :
Algorithms are presented that determine the visual relationships between word images in a document. These include instances of common word images and common substrings that occur often in English language text images. This information is then used to improve the performance of a commercial optical character recognition (OCR) algorithm. The algorithms presented calculate clusters of equivalent word images as well as common initial and final substrings. Experimental results are presented that show a 40% reduction in word level error rate is achieved on a test set of documents degraded by uniform noise
Keywords :
document image processing; optical character recognition; English language text images; OCR algorithm; OCR results; experimental results; optical character recognition algorithm; performance; postprocessing algorithms; substrings; text document; uniform noise; visual interword constraints; word images; word level error rate; Character recognition; Clustering algorithms; Degradation; Error analysis; Natural languages; Noise level; Noise reduction; Optical character recognition software; Optical noise; Testing;
Conference_Titel :
Image Processing, 1995. Proceedings., International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
0-8186-7310-9
DOI :
10.1109/ICIP.1995.537638