• DocumentCode
    2306583
  • Title

    Segmenting bangla text for optical recognition

  • Author

    Sattar, Md A. ; Mahmud, Khaled ; Arafat, Humayun ; Zaman, A. F M Noor Uz

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Bangladesh Univ. of Eng. & Technol. (BUET), Dhaka
  • fYear
    2007
  • fDate
    27-29 Dec. 2007
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    One of the important reasons for poor recognition rate in optical character recognition (OCR) system is the error in character segmentation. Existence of different type of characters in the scanned documents is a major problem to design an effective character segmentation procedure. In this paper, a new technique is presented for identification and segmentation of Bengali printed characters. This paper focuses on the segmentation of printed Bengali characters for efficient recognition of the characters. Our Line segmentation success rate is 99.7 % for 1000 lines, we have tested. Our Word segmentation success rate is 99.8 % for 4900 words tested. From the experiment we noticed that isolated characters fall into isolated group in 99.50 % cases. Most of the errors come from connected characters and characters having tau in front of them as segmenting tau we take the help of width. From the experiment we noticed that most of the errors came from components having multi-touching points between two characters.
  • Keywords
    natural language processing; optical character recognition; text analysis; Bengali printed characters; character segmentation procedure; line segmentation; optical character recognition; text segmentation; word segmentation; Character recognition; Computer errors; Computer science; Handwriting recognition; Image segmentation; Optical character recognition software; Optical distortion; Optical noise; Testing; Text recognition; Bangla Language Processing; Bangla OCR; Bangla Text segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and information technology, 2007. iccit 2007. 10th international conference on
  • Conference_Location
    Dhaka
  • Print_ISBN
    978-1-4244-1550-2
  • Electronic_ISBN
    978-1-4244-1551-9
  • Type

    conf

  • DOI
    10.1109/ICCITECHN.2007.4579373
  • Filename
    4579373