DocumentCode
1994294
Title
Fast lexicon-based word recognition in noisy index card images
Author
Lucas, Simon M. ; Patoulas, Gregory ; Downton, Andy C.
Author_Institution
Comput. Sci. Dept., Essex Univ., Colchester, UK
fYear
2003
fDate
3-6 Aug. 2003
Firstpage
462
Abstract
This paper describes a complete system for reading type-written lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.
Keywords
feature extraction; image classification; image denoising; image matching; image recognition; optical character recognition; OCR classifier; character hypothesis graph; flat dynamic programming; lexicon-based word recognition; museum index cards; noisy images; noisy index card images; performance evaluation; priority-queue based algorithm; type-written lexicon words; Algorithm design and analysis; Computer science; Dynamic programming; Image recognition; Image segmentation; Optical character recognition software; Packaging machines; Search methods; Systems engineering and theory; Throughput;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN
0-7695-1960-1
Type
conf
DOI
10.1109/ICDAR.2003.1227708
Filename
1227708
Link To Document