DocumentCode :
1137966
Title :
Keyword spotting in poorly printed documents using pseudo 2-D hidden Markov models
Author :
Kuo, Shyh-shiaw ; Agazzi, Oscar E.
Author_Institution :
AT&T Bell Labs., Somerset, NJ, USA
Volume :
16
Issue :
8
fYear :
1994
fDate :
8/1/1994 12:00:00 AM
Firstpage :
842
Lastpage :
848
Abstract :
An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented. For each keyword, two statistical models, called pseudo 2-D hidden Markov models, are created for representing the actual keyword and all the other extraneous words, respectively. Dynamic programming is then used for matching an unknown input word with the two models and for making a maximum likelihood decision. Although the models are pseudo 2-D in the sense that they are not fully connected 2-D networks, they are shown to be general enough in characterizing printed words efficiently. These models facilitate a nice “elastic matching” property in both horizontal and vertical directions, which makes the recognizer not only independent of size and slant but also tolerant of highly deformed and noisy words. The system is evaluated on a synthetically created database that contains about 26000 words. Currently, the authors achieve a recognition accuracy of 99% when words in testing and training sets are of the same font size, and 96% when they are in different sizes. In the latter case, the conventional 1-D HMM achieves only a 70% accuracy rate
Keywords :
decision theory; document image processing; dynamic programming; hidden Markov models; maximum likelihood estimation; optical character recognition; statistical models; dynamic programming; elastic matching; keyword spotting; maximum likelihood decision; poorly printed documents; pseudo 2-D hidden Markov models; recognition accuracy; robust machine recognition; statistical models; testing; training sets; Character recognition; Dynamic programming; Hidden Markov models; Nonlinear optics; Optical character recognition software; Optical distortion; Optical scattering; Pattern recognition; Speech recognition; Text recognition;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/34.308482
Filename :
308482
Link To Document :
بازگشت