• DocumentCode
    595405
  • Title

    Learning features for predicting OCR accuracy

  • Author

    Peng Ye ; Doermann, David

  • Author_Institution
    Inst. for Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
  • fYear
    2012
  • fDate
    11-15 Nov. 2012
  • Firstpage
    3204
  • Lastpage
    3207
  • Abstract
    In this paper, we present a new method for assessing the quality of degraded document images using unsupervised feature learning. The goal is to build a computational model to automatically predict OCR accuracy of a degraded document image without a reference image. Current approaches for this problem typically rely on hand-crafted features whose design is based on heuristic rules that may not be generalizable. In contrast, we explore an unsupervised feature learning framework to learn effective and efficient features for predicting OCR accuracy. Our experimental results, on a set of historic newspaper images, show that the proposed method outperforms a baseline method which combines features from previous works.
  • Keywords
    document image processing; optical character recognition; publishing; unsupervised learning; OCR prediction accuracy; computational model; degraded document images; hand-crafted features; heuristic rules; historic newspaper images; optical character recognition; reference image; unsupervised feature learning framework; Accuracy; Degradation; Feature extraction; Humans; Optical character recognition software; Predictive models; Speckle;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2012 21st International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4673-2216-4
  • Type

    conf

  • Filename
    6460846