• DocumentCode
    2198339
  • Title

    User-Defined Expected Error Rate in OCR Postprocessing by Means of Automatic Threshold Estimation

  • Author

    Navarro-Cerdan, J. Ramon ; Arlandis, Joaquim ; Perez-Cortes, Juan-Carlos ; Llobet, Rafael

  • Author_Institution
    Inst. Tecnol. de Inf., Univ. Politec. de Valencia, Valencia, Spain
  • fYear
    2010
  • fDate
    16-18 Nov. 2010
  • Firstpage
    405
  • Lastpage
    409
  • Abstract
    In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a “transformation cost”) is usually obtained, reflecting the likelihood (or its inverse) that the string of OCR hypotheses belongs to the model. Using a threshold on this index (or cost) to reject the less reliable hypotheses, a variable level of expected accuracy can be imposed on the output. It is much more convenient for the user the ability to “fix” at an acceptable level the expected error rate instead of having to deal with an arbitrary threshold. Of course, the result will always be high reject rates for difficult tasks and lower reject rates for easier tasks.
  • Keywords
    image segmentation; optical character recognition; probability; OCR postprocessing; automatic threshold estimation; language model; optical character recognition; probability; reliability index; user defined expected error rate;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on
  • Conference_Location
    Kolkata
  • Print_ISBN
    978-1-4244-8353-2
  • Type

    conf

  • DOI
    10.1109/ICFHR.2010.126
  • Filename
    5693597