DocumentCode :
2708901
Title :
Bootstrapping text recognition from stop words
Author :
Ho, Tin Kam
Author_Institution :
Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
Volume :
1
fYear :
1998
fDate :
16-20 Aug 1998
Firstpage :
605
Abstract :
Recognition of arbitrary noisy English text has been difficult because of problems in character segmentation and multi-font symbol classification. Both segmentation and recognition can be easier with more knowledge of the dominant font used in a given text page. This has led to some recent studies that show promising methods for extracting character prototypes from a text image provided that truth is given for part of the image. In this paper we investigate the feasibility of such a strategy without dependence on ground truth. We replace the needed truth by results of direct recognition of some frequently occurring words. The method makes use of the observation that over half of the words in a typical English text passage are contained in a very small lexicon
Keywords :
computer vision; document image processing; feature extraction; image segmentation; knowledge based systems; learning systems; optical character recognition; English text; OCR; adaptive text recognition; bootstrapping; character segmentation; feature extraction; learning systems; stop words; word recognition; Application software; Character recognition; Computer science; Image analysis; Image recognition; Optical character recognition software; Prototypes; Shape; Testing; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on
Conference_Location :
Brisbane, Qld.
ISSN :
1051-4651
Print_ISBN :
0-8186-8512-3
Type :
conf
DOI :
10.1109/ICPR.1998.711216
Filename :
711216
Link To Document :
بازگشت