DocumentCode :
1820962
Title :
Transform based approach for Indic script identification from handwritten document images
Author :
Obaidullah, Sk Md ; Karim, Rownaqul ; Shaikh, Sujal ; Halder, Chayan ; Das, Nibaran ; Roy, Kaushik
Author_Institution :
Aliah Univ., Kolkata, India
fYear :
2015
fDate :
26-28 March 2015
Firstpage :
1
Lastpage :
7
Abstract :
In a multi-script country like India script identification from document images is an essential step before choosing appropriate script specific OCR (Optical Character Recognizer). The problem of handwritten script identification is more challenging compared to printed one due to uneven variations with respect to writers, time, content etc. Increasing efforts are coming day by day from document image processing researchers to develop standard techniques for Indic script identification. But most of the works is found to be considering printed script document images. In this paper a simple, robust and segmentation free technique based on different image transform methods and statistical features to identify any one of the four popular Indic scripts namely Bangla, Roman, Devanagari and Oriya is proposed. A dataset of total 101 handwritten document images comprising of more than 11000 words and 1300 lines with almost equal distribution of each type of scripts are built, which were collected from different writers with varying age, sex and educational qualification. On experimentation, an average accuracy rate of 88.1% is found for Four-scripts combination by MLP (Multilayer Perceptron) classifier after five fold cross validation. The average Tri-Scripts and Bi-Scripts accuracy are found to be 89.7% and 94.3% respectively. The outcome of this work is really impressive considering inherent complexities of handwritten Indic scripts.
Keywords :
document image processing; handwritten character recognition; image classification; image segmentation; multilayer perceptrons; natural language processing; optical character recognition; transforms; Bangla; Devanagari; India script identification; Indic script identification; MLP classifier; OCR; Oriya; Roman; bi-scripts accuracy; document image processing researcher; handwritten document image; handwritten script identification; image transform method; multilayer perceptron classifier; multiscript country; optical character recognizer; printed script document image; segmentation free technique; statistical feature; transform based approach; tri-scripts accuracy; Discrete cosine transforms; Encoding; Euclidean distance; Handwriting recognition; Image recognition; Image segmentation; Optical imaging; Handwritten Script Identification; Image Transform; MLP Classifier; OCR;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing, Communication and Networking (ICSCN), 2015 3rd International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4673-6822-3
Type :
conf
DOI :
10.1109/ICSCN.2015.7219852
Filename :
7219852
Link To Document :
بازگشت