DocumentCode :
153371
Title :
OCR Performance Prediction Using a Bag of Allographs and Support Vector Regression
Author :
Bhowmik, Tapan Kumar ; Paquet, T. ; Ragot, N.
Author_Institution :
LITIS EA-4108, Univ. de Rouen, Rouen, France
fYear :
2014
fDate :
7-10 April 2014
Firstpage :
202
Lastpage :
206
Abstract :
In this paper, we describe a novel and simple technique for prediction of OCR results without using any OCR. The technique uses a bag of allographs to characterize textual components. Then a support vector regression (SVR) technique is used to build a predictor based on the bag of allographs. The performance of the system is evaluated on a corpus of historical documents. The proposed technique produces correct prediction of OCR results on training and test documents within the range of standard deviation of 4.18% and 6.54% respectively. The proposed system has been designed as a tool to assist selection of corpora in libraries and specify the typical performance that can be expected on the selection.
Keywords :
document image processing; optical character recognition; regression analysis; support vector machines; OCR performance prediction; SVR technique; bag of allographs; historical documents; standard deviation; support vector regression; system performance; textual components; Accuracy; Buildings; Image edge detection; Libraries; Optical character recognition software; Training; Vectors; Bag of Allographs; Historical Documents; OCR; OCR Performance Prediction; Support Vector Regression (SVR); Template Matching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
Conference_Location :
Tours
Print_ISBN :
978-1-4799-3243-6
Type :
conf
DOI :
10.1109/DAS.2014.72
Filename :
6830998
Link To Document :
بازگشت