• DocumentCode
    2637137
  • Title

    A high accuracy OCR system for printed Telugu text

  • Author

    Lakshmi, Vasantha C. ; Patvardhan, C.

  • Author_Institution
    Dept. of Phys. & Comput. Sci., Dayalbagh Educ. Inst., Agra, India
  • Volume
    2
  • fYear
    2003
  • fDate
    15-17 Oct. 2003
  • Firstpage
    725
  • Abstract
    Telugu is one of the oldest and most popular languages of India. The paper describes the design and development of a Telugu optical character recognition system for printed text (TOSP). Preprocessing tasks considered are: conversion of a grey scale image to a binary image; image rectification; skew detection and removal; segmentation of text into lines, words and basic symbols. Basic symbols are identified as the fundamental unit of segmentation and are recognized by neural recognizers. The recognizers are aided by an improvement module that uses additional logic to recognize confusing symbols correctly, resulting in increased recognition accuracy. The combinations of these basic symbols that together form characters and compound characters of Telugu are also determined to complete the recognition process. The special feature of TOSP is that it is designed to handle multiple sizes and multiple fonts. Further, the output produced by TOSP can be opened directly in any Indian language software that supports the facility for transliteration into Telugu script, and then edited. Several such software are popular and available.
  • Keywords
    image segmentation; natural languages; neural nets; optical character recognition; Indian language software; Telugu optical character recognition; binary image; confusing symbols; grey scale image; high accuracy OCR system; image rectification; image segmentation; neural recognizers; printed text; recognition accuracy; skew detection; skew removal; Character recognition; Image converters; Image segmentation; Logic; Natural languages; Nonlinear optics; Optical character recognition software; Optical design; Optical sensors; Physics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region
  • Print_ISBN
    0-7803-8162-9
  • Type

    conf

  • DOI
    10.1109/TENCON.2003.1273274
  • Filename
    1273274