• DocumentCode
    2038751
  • Title

    Bilingual OCR system for printed documents in Malayalam and English

  • Author

    Rahiman, M.A. ; Adheena, C.V. ; Anitha, R. ; Deepa, N. ; Kumar, G. Manoj ; Rajasree, M.S.

  • Author_Institution
    Karpagam Univ., Coimbatore, India
  • Volume
    3
  • fYear
    2011
  • fDate
    8-10 April 2011
  • Firstpage
    40
  • Lastpage
    45
  • Abstract
    India is a multilingual and multi-script country where a line of a bilingual document page may contain text words both in regional language and in English. Recognition of documents containing multi-scripts is really a challenging task, which needs more effort of the OCR designers for improving the accuracy rate. This paper presents a Bilingual OCR system for printed Malayalam and English text. Here we propose an algorithm which can accept scanned image of printed characters as input and produce editable Malayalam and English characters in a predefined format as output. The image acquired is segmented into line and character-wise using pixel by pixel approach by scanning from top-left of the image to bottom-right. The character image obtained after segmentation is resized to 16 × 16 bitmap which is used for comparison. The database contains characters in various fonts of both the languages. This database is used for comparison with the resized character image. The comparison is done using pixel-match algorithm. The matched character is displayed in the notepad. An efficiency of 87.25% is obtained using this approach.
  • Keywords
    document image processing; optical character recognition; English; Malayalam; bilingual OCR system; bilingual document; character image; documents recognition; image segmentation; pixel by pixel approach; printed documents; Character recognition; Databases; Feature extraction; Image segmentation; Optical character recognition software; Optical imaging; Pixel; Bilingual OCR; Feature Extraction; Handwritten characters; Malayalam; Optical Character Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electronics Computer Technology (ICECT), 2011 3rd International Conference on
  • Conference_Location
    Kanyakumari
  • Print_ISBN
    978-1-4244-8678-6
  • Electronic_ISBN
    978-1-4244-8679-3
  • Type

    conf

  • DOI
    10.1109/ICECTECH.2011.5941797
  • Filename
    5941797