A hidden Markov model for language syntax in text recognition

Author

Hull, Jonathan J.

Author_Institution

Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA

fYear

1992

fDate

30 Aug-3 Sep 1992

Firstpage

124

Lastpage

127

Abstract

The use of a hidden Markov model (HMM) for language syntax to improve the performance of a text recognition algorithm is proposed. Syntactic constraints are described by the transition probabilities between word classes. The confusion between the feature string for a word and the various syntactic classes is also described probabilistically. A modification of the Viterbi algorithm is also proposed that finds a fixed number of sequences of syntactic classes for a given sentence that have the highest probabilities of occurrence, given the feature strings for the words. An experimental application of this approach is demonstrated with a word hypothesization algorithm that produces a number of guesses about the identity of each word in a running text. The use of first and second order transition probabilities is explored. Overall performance of between 65 and 80 percent reduction in the average number of words that can match a given image is achieved

Keywords

Markov processes; character recognition; grammars; Viterbi algorithm; hidden Markov model; language syntax; syntactic constraints; text recognition; word class transition probabilities; Algorithm design and analysis; Character recognition; Dictionaries; Hidden Markov models; Image analysis; Performance analysis; Shape; Text analysis; Text recognition; Viterbi algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Pattern Recognition, 1992. Vol.II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on

Conference_Location

The Hague

Print_ISBN

0-8186-2915-0

Type

conf

DOI

10.1109/ICPR.1992.201736

Filename

201736