• DocumentCode
    3777642
  • Title

    Post-processing methodology for word level Telugu character recognition systems using Unicode Approximation Models

  • Author

    N. Shobha Rani;T. Vasudev

  • Author_Institution
    Maharaja Research Foundation, Maharaja Institute of Technology, Mysuru, Karnataka, India
  • Volume
    1
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Digitization and automatic interpretation of document images into editable document format is the primary inclination of optical character recognition systems (OCR). This paper proposes a novel technique for resolution of post processing errors that occurs with respect to Telugu OCR using word level Unicode Approximation Models (UAM) through a mapper module. The mapper module performs the word level one-one mapping of assigning a sequence of recognized class labels to appropriate UAM. The sequence of recognized class labels are related to one particular word and are generated from the classifier as output. The proposed algorithm effectively resolves the problem of segmentation errors, preprocessing errors like cuts and merges in characters, noise, occlusions, semantic ordering and confusing character classes. The proposed UAM models provide adequate and consistent accuracies of around 96.2% for printed words and 91.7% towards handwritten words respectively.
  • Keywords
    "Optical character recognition software","Character recognition","Error correction","Program processors","Dictionaries","Databases","Computational modeling"
  • Publisher
    ieee
  • Conference_Titel
    Trends in Automation, Communications and Computing Technology (I-TACT-15), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/ITACT.2015.7492681
  • Filename
    7492681