Title : 
OCR Performance Prediction Using a Bag of Allographs and Support Vector Regression
         
        
            Author : 
Bhowmik, Tapan Kumar ; Paquet, T. ; Ragot, N.
         
        
            Author_Institution : 
LITIS EA-4108, Univ. de Rouen, Rouen, France
         
        
        
        
        
        
            Abstract : 
In this paper, we describe a novel and simple technique for prediction of OCR results without using any OCR. The technique uses a bag of allographs to characterize textual components. Then a support vector regression (SVR) technique is used to build a predictor based on the bag of allographs. The performance of the system is evaluated on a corpus of historical documents. The proposed technique produces correct prediction of OCR results on training and test documents within the range of standard deviation of 4.18% and 6.54% respectively. The proposed system has been designed as a tool to assist selection of corpora in libraries and specify the typical performance that can be expected on the selection.
         
        
            Keywords : 
document image processing; optical character recognition; regression analysis; support vector machines; OCR performance prediction; SVR technique; bag of allographs; historical documents; standard deviation; support vector regression; system performance; textual components; Accuracy; Buildings; Image edge detection; Libraries; Optical character recognition software; Training; Vectors; Bag of Allographs; Historical Documents; OCR; OCR Performance Prediction; Support Vector Regression (SVR); Template Matching;
         
        
        
        
            Conference_Titel : 
Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
         
        
            Conference_Location : 
Tours
         
        
            Print_ISBN : 
978-1-4799-3243-6
         
        
        
            DOI : 
10.1109/DAS.2014.72