DocumentCode
2198339
Title
User-Defined Expected Error Rate in OCR Postprocessing by Means of Automatic Threshold Estimation
Author
Navarro-Cerdan, J. Ramon ; Arlandis, Joaquim ; Perez-Cortes, Juan-Carlos ; Llobet, Rafael
Author_Institution
Inst. Tecnol. de Inf., Univ. Politec. de Valencia, Valencia, Spain
fYear
2010
fDate
16-18 Nov. 2010
Firstpage
405
Lastpage
409
Abstract
In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a “transformation cost”) is usually obtained, reflecting the likelihood (or its inverse) that the string of OCR hypotheses belongs to the model. Using a threshold on this index (or cost) to reject the less reliable hypotheses, a variable level of expected accuracy can be imposed on the output. It is much more convenient for the user the ability to “fix” at an acceptable level the expected error rate instead of having to deal with an arbitrary threshold. Of course, the result will always be high reject rates for difficult tasks and lower reject rates for easier tasks.
Keywords
image segmentation; optical character recognition; probability; OCR postprocessing; automatic threshold estimation; language model; optical character recognition; probability; reliability index; user defined expected error rate;
fLanguage
English
Publisher
ieee
Conference_Titel
Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on
Conference_Location
Kolkata
Print_ISBN
978-1-4244-8353-2
Type
conf
DOI
10.1109/ICFHR.2010.126
Filename
5693597
Link To Document