Title :
Structural feature based approach for script identification from printed Indian document
Author :
Obaidullah, Sk Md ; Mondal, Aniruddha ; Roy, Kaushik
Author_Institution :
Dept. of Comput. Sc. & Eng., Aliah Univ., Kolkata, India
Abstract :
Script identification is a complex real life problem for automation of printed or handwritten document processing. The task becomes more challenging when it comes about a multi script/lingual country like India. For the development of OCR for a particular language the script needs to be identified first. That is why development of a script identification system is a pressing need. Till date no such work is available considering all 13 official Indian scripts. In this paper we present a scheme for script identification from printed document for 10 official Indian scripts namely Bangla, Devnagari, Roman, Oriya, Urdu, Gujarati, Telegu, Kannada, Malayalam and Kashmiri. Total 459 document pages are considered and 62 dimensional feature set is computed for the present work. Finally using simple logistic classifier with 5 fold cross validation an average identification rate of 98.9% is found.
Keywords :
document image processing; handwriting recognition; natural language processing; optical character recognition; Bangla; Devnagari; Gujarati; India; Indian scripts; Kannada; Kashmiri; Malayalam; Oriya; Roman; Telegu; Urdu; document pages; handwritten document processing; multiscript-lingual country; optical character recognition; printed Indian document; printed document processing; script identification; script identification system; structural feature based approach; Computers; Databases; Educational institutions; Feature extraction; Logistics; Optical character recognition software; Signal processing; Feature Set; OCR; Printed Script Identification; Simple Logistic Classifier;
Conference_Titel :
Signal Processing and Integrated Networks (SPIN), 2014 International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-2865-1
DOI :
10.1109/SPIN.2014.6776933