DocumentCode :
3695262
Title :
Word-level script identification for handwritten Indic scripts
Author :
Pawan Kumar Singh;Ram Sarkar;Mita Nasipuri;David Doermann
Author_Institution :
Computer Science &
fYear :
2015
Firstpage :
1106
Lastpage :
1110
Abstract :
Automatic script identification from handwritten document images facilitates many important applications such as indexing, sorting and triage. A given Optical Character Recognition (OCR) system is typically trained on only a single script but for documents or collections containing different scripts, there must be some way to automatically identify the script prior to OCR. For Indic script research, some results have been reported in the literature but the task is far from solved. In this paper, we propose a word-level script identification technique for six handwritten Indic scripts- Bangla, Devanagari, Gurumukhi, Malayalam, Oriya Telugu and the Roman script. A set of 82 features has been designed using a combination of elliptical and polygonal approximation techniques. Our approach has been evaluated on a dataset of 7000 handwritten text words, using multiple classifiers. A Multi-Layer Perceptron (MLP) classifier was found to be the best classifier resulting in 95.35% accuracy. The result is progressive considering the complexities and shape variations of the Indic scripts.
Keywords :
"Handwriting recognition","Integrated circuits","Optical character recognition software","Forensics","Feature extraction","Support vector machines","Bagging"
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type :
conf
DOI :
10.1109/ICDAR.2015.7333932
Filename :
7333932
Link To Document :
بازگشت