Title :
Handwritten and Typewritten Text Identification and Recognition Using Hidden Markov Models
Author :
Cao, Huaigu ; Prasad, Rohit ; Natarajan, Prem
Author_Institution :
Raytheon BBN Technol., Cambridge, MA, USA
Abstract :
We present a system for identification and recognition of handwritten and typewritten text from document images using hidden Markov models (HMMs) in this paper. Our text type identification uses OCR decoding to generate word boundaries followed by word-level handwritten/typewritten identification using HMMs. We show that the contextual constraints from the HMM significantly improves the identification performance over the conventional Gaussian mixture model (GMM)-based method. Type identification is then used to estimate the frame sample rates and frame width of feature sequences for HMM OCR system for each type independently. This type-dependent approach to computing the frame sample rate and frame width shows significant improvement in OCR accuracy over type-independent approaches.
Keywords :
Gaussian processes; document image processing; feature extraction; handwritten character recognition; hidden Markov models; image recognition; text analysis; word processing; Gaussian mixture model-based method; HMM OCR system; OCR accuracy; OCR decoding; contextual constraint; document image; feature sequence; handwritten text identification; handwritten text recognition; hidden Markov model; typewritten text identification; typewritten text recognition; word boundary; Adaptation models; Classification algorithms; Error analysis; Feature extraction; Hidden Markov models; Optical character recognition software; Training; Gaussian mixture model; hidden Markov model; optical character recognition;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.155