Title :
Automatic Selection of Binarization Method for Robust OCR
Author :
Chattopadhyay, Taraprasad ; Reddy, V.R. ; Garain, U.
Author_Institution :
Innovation Lab., Tata Consultancy Services, Kolkata, India
Abstract :
Many algorithms are now available for doing the same task (e.g. binarization, page segmentation, character recognition, etc.) in document image analysis (DIA) and choosing a particular algorithm(s) for a particular task is often a non-trivial problem. This paper proposes a model for automatically selecting the correct algorithm(s) for a given problem. Binarization has been taken a reference to illustrate the proposed approach. Several previously unexplored issues are addressed in this work. For example, only one method may not be good for the binarization of an entire document whereas a particular method may produce desired result for a particular region. Therefore, for a given document image, our model selects a set of one or more binarization techniques suitable for different regions of the document. This selection is completely automatic and guided by the machine learning approaches. Formulation of a completely automatic way for generating the annotated data for training the learning algorithms is also a novel contribution of this work. Evaluation of the approach is done using ICDAR 2003 Robust Reading data set and results highlight the potential of the proposed approach for automatic selection of correct DIA algorithm(s) from a set of several alternatives.
Keywords :
document image processing; learning (artificial intelligence); optical character recognition; DIA; ICDAR 2003 Robust Reading data set; binarization method automatic selection; document image analysis; machine learning approach; robust OCR; Accuracy; Algorithm design and analysis; Image segmentation; Machine learning algorithms; Optical character recognition software; Prediction algorithms; Training; Automatic algorithm selection; Automatic generation of training data; Binarization techniques; Evaluation; Machine learning;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.237