Title :
Bootstrapping text recognition from stop words
Author_Institution :
Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
Abstract :
Recognition of arbitrary noisy English text has been difficult because of problems in character segmentation and multi-font symbol classification. Both segmentation and recognition can be easier with more knowledge of the dominant font used in a given text page. This has led to some recent studies that show promising methods for extracting character prototypes from a text image provided that truth is given for part of the image. In this paper we investigate the feasibility of such a strategy without dependence on ground truth. We replace the needed truth by results of direct recognition of some frequently occurring words. The method makes use of the observation that over half of the words in a typical English text passage are contained in a very small lexicon
Keywords :
computer vision; document image processing; feature extraction; image segmentation; knowledge based systems; learning systems; optical character recognition; English text; OCR; adaptive text recognition; bootstrapping; character segmentation; feature extraction; learning systems; stop words; word recognition; Application software; Character recognition; Computer science; Image analysis; Image recognition; Optical character recognition software; Prototypes; Shape; Testing; Text recognition;
Conference_Titel :
Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on
Conference_Location :
Brisbane, Qld.
Print_ISBN :
0-8186-8512-3
DOI :
10.1109/ICPR.1998.711216