• DocumentCode
    643374
  • Title

    Comparison of different classifiers for script identification from handwritten document

  • Author

    Obaidullah, Sk Md ; Roy, Kaushik ; Das, Niladri

  • fYear
    2013
  • fDate
    26-28 Sept. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    For a multi script/lingual country like India Script identification is a complex real life problem for automation of document processing. Handwritten script identification is again much more complex compared to print one. Here scripts from multi script handwritten documents are identified and then performance is compared using different well known classifiers. We followed a two stage approach for the same. Firstly, we have identified six scripts used for writing six official languages of India in Handwritten domain, which are easily available to us. Using some Abstract/Mathematical features, Structure based features and Script dependent features at document level a 41 dimensional feature set is prepared. Then, a series of classifiers namely Logistic Model Tree, Random Forest, Multi Layer Perceptron, Sequential Minimal Optimization, LibLINEAR, RBFNetwork and Fuzzy Unordered Rule Induction Algorithm are applied on the feature set to classify among the six handwritten scripts and the results are compared. Among all these classifiers, Logistic Model Tree shows highest accuracy rate of 91.2% with a 5 fold cross validation whereas SMO model has lowest convergence time of 0.05s.
  • Keywords
    document image processing; fuzzy set theory; image classification; multilayer perceptrons; radial basis function networks; text analysis; India; LibLINEAR; RBFNetwork; SMO model; abstract-mathematical features; classifier comparison; dimensional feature set; document processing automation; fuzzy unordered rule induction algorithm; handwritten script identification; logistic model tree; multilayer perceptron; multilingual country; multiscript country; multiscript handwritten documents; random forest; script dependent features; script identification; sequential minimal optimization; structure based features; Accuracy; Computers; Feature extraction; Fractals; Logistics; Neurons; Optical character recognition software; Classifier; Handwritten Script Identification; Optical Character Recognizer; Weka;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing, Computing and Control (ISPCC), 2013 IEEE International Conference on
  • Conference_Location
    Solan
  • Print_ISBN
    978-1-4673-6188-0
  • Type

    conf

  • DOI
    10.1109/ISPCC.2013.6663388
  • Filename
    6663388