Title :
Robust OCR of degraded documents
Author :
Natarajan, Premkumar ; Bazzi, Issam ; Lu, Zhidong ; Makhoul, John ; Scwhartz, Richard
Author_Institution :
GTE Corp., Cambridge, MA, USA
Abstract :
This paper is concerned with techniques for performing robust OCR of degraded documents, such us faxed text, using a hidden Markov model (HMM) based OCR system. We present two strategies for dealing with degraded documents. The first strategy is to train the system on degraded documents that have been subjected to the same, or similar, degradation process as the documents to be recognized. The second, more sophisticated, strategy is to use adaptation to adjust the parameters of the trained model in order to improve recognition accuracy on a specific document. This adjustment of model parameters is typically posed as a constrained optimization problem wherein a certain prespecified objective function is to be optimized. We present a comparative study of two objective functions. The likelihood function and the posterior probability. A variation of the basic posterior probability method is also discussed. Using adaptation with a model trained on fax-degraded data we have reduced, by a factor of three, the character error rate on fax-degraded text images generated from the University of Washington English Image Database I
Keywords :
document image processing; facsimile; hidden Markov models; optical character recognition; optimisation; probability; University of Washington English Image Database I; adaptation; character error rate; constrained optimization problem; degraded documents; faxed text; hidden Markov model based OCR system; likelihood function; objective function; parameter adjustment; posterior probability; recognition accuracy; robust OCR; training; Adaptation model; Character generation; Constraint optimization; Degradation; Error analysis; Hidden Markov models; Image databases; Image generation; Optical character recognition software; Robustness;
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
DOI :
10.1109/ICDAR.1999.791798