Title :
Page-level script identification from multi-script handwritten documents
Author :
Singh, Pawan Kumar ; Dalal, Santu Kumar ; Sarkar, Ram ; Nasipuri, Mita
Author_Institution :
Dept. of Comput. Sci. & Eng., Jadvapur Univ., Kolkata, India
Abstract :
Script identification has long been the forerunner of many Optical Character Recognition (OCR) processes in a multi-lingual document environment. Script identification has numerous applications in the field of document image analysis, such as document sorting, indexing, retrieval and translation, etc. In this paper, we have developed a page-level script identification technique for handwritten documents using the texture features. The texture features are extracted from the document pages based on the Gray Level Co-occurrence Matrix (GLCM). The proposed technique has been evaluated on four scripts namely, Bangla, Devnagari, Telugu, and Roman using multiple classifiers. Based on their identification accuracies, it is observed that Multi Layer Perceptron (MLP) classifier performs the best. The experimental results demonstrate the effectiveness of the GLCM features in identification of handwritten scripts. Experiments are conducted on a total of 120 document pages and the overall accuracy of the system is found to be 91.48%. Though the system is evaluated on limited dataset, considering the complexities of the scripts, the result can be assumed satisfactory.
Keywords :
document image processing; feature extraction; identification; image classification; image texture; matrix algebra; multilayer perceptrons; optical character recognition; GLCM; MLP classifier; OCR; document image analysis; gray level cooccurrence matrix; multilayer perceptron; multiscript handwritten document; optical character recognition; page-level script identification; texture feature extraction; Accuracy; Feature extraction; Image analysis; Optical character recognition software; Optical imaging; Symmetric matrices; Text analysis; Gray Level Cooccurrence Matrix; Handwritten Indian scripts; Optical Character Recognition; Page-level script identification;
Conference_Titel :
Computer, Communication, Control and Information Technology (C3IT), 2015 Third International Conference on
Conference_Location :
Hooghly
Print_ISBN :
978-1-4799-4446-0
DOI :
10.1109/C3IT.2015.7060113