DocumentCode :
595405
Title :
Learning features for predicting OCR accuracy
Author :
Peng Ye ; Doermann, David
Author_Institution :
Inst. for Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
3204
Lastpage :
3207
Abstract :
In this paper, we present a new method for assessing the quality of degraded document images using unsupervised feature learning. The goal is to build a computational model to automatically predict OCR accuracy of a degraded document image without a reference image. Current approaches for this problem typically rely on hand-crafted features whose design is based on heuristic rules that may not be generalizable. In contrast, we explore an unsupervised feature learning framework to learn effective and efficient features for predicting OCR accuracy. Our experimental results, on a set of historic newspaper images, show that the proposed method outperforms a baseline method which combines features from previous works.
Keywords :
document image processing; optical character recognition; publishing; unsupervised learning; OCR prediction accuracy; computational model; degraded document images; hand-crafted features; heuristic rules; historic newspaper images; optical character recognition; reference image; unsupervised feature learning framework; Accuracy; Degradation; Feature extraction; Humans; Optical character recognition software; Predictive models; Speckle;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460846
Link To Document :
بازگشت