DocumentCode
595405
Title
Learning features for predicting OCR accuracy
Author
Peng Ye ; Doermann, David
Author_Institution
Inst. for Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
fYear
2012
fDate
11-15 Nov. 2012
Firstpage
3204
Lastpage
3207
Abstract
In this paper, we present a new method for assessing the quality of degraded document images using unsupervised feature learning. The goal is to build a computational model to automatically predict OCR accuracy of a degraded document image without a reference image. Current approaches for this problem typically rely on hand-crafted features whose design is based on heuristic rules that may not be generalizable. In contrast, we explore an unsupervised feature learning framework to learn effective and efficient features for predicting OCR accuracy. Our experimental results, on a set of historic newspaper images, show that the proposed method outperforms a baseline method which combines features from previous works.
Keywords
document image processing; optical character recognition; publishing; unsupervised learning; OCR prediction accuracy; computational model; degraded document images; hand-crafted features; heuristic rules; historic newspaper images; optical character recognition; reference image; unsupervised feature learning framework; Accuracy; Degradation; Feature extraction; Humans; Optical character recognition software; Predictive models; Speckle;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location
Tsukuba
ISSN
1051-4651
Print_ISBN
978-1-4673-2216-4
Type
conf
Filename
6460846
Link To Document