Title :
Word spotting in scanned images using hidden Markov models
Author :
Chen, Francine R. ; Wilcox, Lynn U. ; Bloomberg, Dun S.
Author_Institution :
Xerox Palo Alto Res. Center, CA, USA
Abstract :
A hidden-Markov-model (HMM)-based system for font-independent spotting of user-specified keywords in a scanned image is described. Word bounding boxes of potential keywords are extracted from the image using a morphology-based preprocessor. Feature vectors based on the external shape and internal structure of the word are computed over vertical columns of pixels in a word bounding box. For each user-specified keyword, an HMM is created by concatenating appropriate context-dependent character HMMs. Nonkeywords are modeled using an HMM based on context-dependent subcharacter models. Keyword spotting is performed using a Viterbi search through the HMM network created by connecting the keyword and nonkeyword HMMs in parallel. Applications of word-image spotting include information filtering in images from facsimile and copy machines, and information retrieval from text image databases.<>
Keywords :
hidden Markov models; image segmentation; mathematical morphology; optical character recognition; search problems; HMM network; Viterbi search; context-dependent subcharacter models; font-independent spotting; hidden Markov models; morphology-based preprocessor; scanned images; user-specified keywords; word bounding box; word-image spotting;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
Conference_Location :
Minneapolis, MN, USA
Print_ISBN :
0-7803-7402-9
DOI :
10.1109/ICASSP.1993.319732