Title :
Modelling polyfont printed characters with HMMs and a shift invariant Hamming distance
Author :
Elms, A.J. ; Illingworth, J.
Author_Institution :
Dept. of Electron. & Electr. Eng., Surrey Univ., Guildford, UK
Abstract :
Rumours of the death of the problem of machine-printed text recognition have been greatly exaggerated. Reported results can be good enough to lead one to believe that this is a “solved problem”. Closer analysis reveals test data that is often limited in its range of fonts and point sizes. Worse still, results are commonly quoted for noise-free images, ignoring the problems of recognising “real” documents such as faxes. Various methods have been proposed for modelling characters with Hidden Markov Models. The authors, amongst others, have suggested representing a character by analysing the pixel pattern in columns of its image, and linking sequential column patterns together with a HMM. In this paper we propose a method of quantising the patterns by means of a Shift Invariant Hamming Distance. A full experimental evaluation (45 fonts, 5 point sizes) in typical noise results in a recognition accuracy of 99% in the top-3 choices, and 94% top-choice for the best font. The method has a significant advantage in recognising noisy word images, due to classification being achieved without a prior segmentation of the word into characters
Keywords :
Hamming codes; hidden Markov models; optical character recognition; vector quantisation; hidden Markov models; machine-printed text recognition; noise-free images; noisy word images; polyfont printed characters; sequential column patterns; shift invariant Hamming distance; Data analysis; Hamming distance; Hidden Markov models; Image analysis; Image recognition; Joining processes; Pattern analysis; Pixel; Testing; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.599044