Title :
An OCR for separation and identification of mixed English — Gujarati digits using kNN classifier
Author :
Chaudhari, Shailesh A. ; Gulati, Ravi M.
Author_Institution :
Veer Narmad South Gujarat Univ., Surat, India
Abstract :
This paper addresses the script identification problem of bilingual printed document images. We propose an OCR system that separates and identify mixed English-Gujarati digits. Here, first the system is trained with standard data samples. Then for testing, data samples are collected from different sources of paper like, news paper, book, magazine, etc. Random sized pre-processed image is normalized to uniform sized image. A statistical approach is used for feature extraction. For classification kNN classifier is used. The model gives average accuracy of 99.26% for Gujarati digits, 99.20% for English digits, and overall accuracy 99.23%.
Keywords :
document image processing; natural language processing; optical character recognition; pattern classification; statistical analysis; OCR system; bilingual printed document images; kNN classifier; mixed English Gujarati digits; optical character recognition; script identification problem; standard data samples; statistical approach; uniform sized image; Accuracy; Character recognition; Feature extraction; Image recognition; Optical character recognition software; Support vector machine classification; Normalization; Pre-processing; Vector; etc; kNN Classifier;
Conference_Titel :
Intelligent Systems and Signal Processing (ISSP), 2013 International Conference on
Conference_Location :
Gujarat
Print_ISBN :
978-1-4799-0316-0
DOI :
10.1109/ISSP.2013.6526900