DocumentCode
344183
Title
Robust OCR of degraded documents
Author
Natarajan, Premkumar ; Bazzi, Issam ; Lu, Zhidong ; Makhoul, John ; Scwhartz, Richard
Author_Institution
GTE Corp., Cambridge, MA, USA
fYear
1999
fDate
20-22 Sep 1999
Firstpage
357
Lastpage
361
Abstract
This paper is concerned with techniques for performing robust OCR of degraded documents, such us faxed text, using a hidden Markov model (HMM) based OCR system. We present two strategies for dealing with degraded documents. The first strategy is to train the system on degraded documents that have been subjected to the same, or similar, degradation process as the documents to be recognized. The second, more sophisticated, strategy is to use adaptation to adjust the parameters of the trained model in order to improve recognition accuracy on a specific document. This adjustment of model parameters is typically posed as a constrained optimization problem wherein a certain prespecified objective function is to be optimized. We present a comparative study of two objective functions. The likelihood function and the posterior probability. A variation of the basic posterior probability method is also discussed. Using adaptation with a model trained on fax-degraded data we have reduced, by a factor of three, the character error rate on fax-degraded text images generated from the University of Washington English Image Database I
Keywords
document image processing; facsimile; hidden Markov models; optical character recognition; optimisation; probability; University of Washington English Image Database I; adaptation; character error rate; constrained optimization problem; degraded documents; faxed text; hidden Markov model based OCR system; likelihood function; objective function; parameter adjustment; posterior probability; recognition accuracy; robust OCR; training; Adaptation model; Character generation; Constraint optimization; Degradation; Error analysis; Hidden Markov models; Image databases; Image generation; Optical character recognition software; Robustness;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location
Bangalore
Print_ISBN
0-7695-0318-7
Type
conf
DOI
10.1109/ICDAR.1999.791798
Filename
791798
Link To Document