• DocumentCode
    3695262
  • Title

    Word-level script identification for handwritten Indic scripts

  • Author

    Pawan Kumar Singh;Ram Sarkar;Mita Nasipuri;David Doermann

  • Author_Institution
    Computer Science &
  • fYear
    2015
  • Firstpage
    1106
  • Lastpage
    1110
  • Abstract
    Automatic script identification from handwritten document images facilitates many important applications such as indexing, sorting and triage. A given Optical Character Recognition (OCR) system is typically trained on only a single script but for documents or collections containing different scripts, there must be some way to automatically identify the script prior to OCR. For Indic script research, some results have been reported in the literature but the task is far from solved. In this paper, we propose a word-level script identification technique for six handwritten Indic scripts- Bangla, Devanagari, Gurumukhi, Malayalam, Oriya Telugu and the Roman script. A set of 82 features has been designed using a combination of elliptical and polygonal approximation techniques. Our approach has been evaluated on a dataset of 7000 handwritten text words, using multiple classifiers. A Multi-Layer Perceptron (MLP) classifier was found to be the best classifier resulting in 95.35% accuracy. The result is progressive considering the complexities and shape variations of the Indic scripts.
  • Keywords
    "Handwriting recognition","Integrated circuits","Optical character recognition software","Forensics","Feature extraction","Support vector machines","Bagging"
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICDAR.2015.7333932
  • Filename
    7333932