مرکز منطقه ای اطلاع رساني علوم و فناوري - Bootstrapping text recognition from stop words

DocumentCode :

2708901

Title :

Bootstrapping text recognition from stop words

Author :

Ho, Tin Kam

Author_Institution :

Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA

Volume :

fYear :

1998

fDate :

16-20 Aug 1998

Firstpage :

605

Abstract :

Recognition of arbitrary noisy English text has been difficult because of problems in character segmentation and multi-font symbol classification. Both segmentation and recognition can be easier with more knowledge of the dominant font used in a given text page. This has led to some recent studies that show promising methods for extracting character prototypes from a text image provided that truth is given for part of the image. In this paper we investigate the feasibility of such a strategy without dependence on ground truth. We replace the needed truth by results of direct recognition of some frequently occurring words. The method makes use of the observation that over half of the words in a typical English text passage are contained in a very small lexicon

Keywords :

computer vision; document image processing; feature extraction; image segmentation; knowledge based systems; learning systems; optical character recognition; English text; OCR; adaptive text recognition; bootstrapping; character segmentation; feature extraction; learning systems; stop words; word recognition; Application software; Character recognition; Computer science; Image analysis; Image recognition; Optical character recognition software; Prototypes; Shape; Testing; Text recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on

Conference_Location :

Brisbane, Qld.

ISSN :

1051-4651

Print_ISBN :

0-8186-8512-3

Type :

conf

DOI :

10.1109/ICPR.1998.711216

Filename :

711216

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2708901