DocumentCode :
1125031
Title :
On the Recognition of Printed Characters of Any Font and Size
Author :
Kahan, Simon ; Pavlidis, Theo ; Baird, Henry S.
Author_Institution :
Department of Computer Science, University of Washington, Seattle, WA 98195; AT&T Bell Laboratories, Murray Hill, NJ 07974.
Issue :
2
fYear :
1987
fDate :
3/1/1987 12:00:00 AM
Firstpage :
274
Lastpage :
288
Abstract :
We describe the current state of a system that recognizes printed text of various fonts and sizes for the Roman alphabet. The system combines several techniques in order to improve the overall recognition rate. Thinning and shape extraction are performed directly on a graph of the run-length encoding of a binary image. The resulting strokes and other shapes are mapped, using a shape-clustering approach, into binary features which are then fed into a statistical Bayesian classifier. Large-scale trials have shown better than 97 percent top choice correct performance on mixtures of six dissimilar fonts, and over 99 percent on most single fonts, over a range of point sizes. Certain remaining confusion classes are disambiguated through contour analysis, and characters suspected of being merged are broken and reclassified. Finally, layout and linguistic context are applied. The results are illustrated by sample pages.
Keywords :
Bayesian methods; Character recognition; Computer graphics; Image coding; Large-scale systems; Layout; Manufacturing; Optical character recognition software; Shape; Text recognition; Character recognition; reading machines; spelling correction;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.1987.4767901
Filename :
4767901
Link To Document :
بازگشت