Title : 
An Overview of the Tesseract OCR Engine
         
        
        
            Author_Institution : 
Google Inc., Mountain View
         
        
        
        
        
        
        
            Abstract : 
The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.
         
        
            Keywords : 
image classification; optical character recognition; Tesseract OCR engine; UNLV; adaptive classifier; line finding; Filters; Independent component analysis; Inspection; Open source software; Optical character recognition software; Pipelines; Prototypes; Search engines; Testing; Text recognition;
         
        
        
        
            Conference_Titel : 
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
         
        
            Conference_Location : 
Parana
         
        
        
            Print_ISBN : 
978-0-7695-2822-9
         
        
        
            DOI : 
10.1109/ICDAR.2007.4376991