• DocumentCode
    1864397
  • Title

    An efficient line segmentation approach for handwritten Bangla document image

  • Author

    Mullick, K. ; Banerjee, S. ; Bhattacharya, U.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Heritage Inst. of Technol., Kolkata, India
  • fYear
    2015
  • fDate
    4-7 Jan. 2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Text line segmentation plays a vital role in the overall performance of a document recognition system. In the literature, similar segmentation works for offline handwritten Bangla documents are rarely found. On the other hand, certain peculiarities of handwritten Bangla script such as widespread occurrences of ascenders and descenders or some of its characters appearing only as an ascender or descender often cause unique difficulties to this segmentation task. Existence of connected components over a number of successive text lines is a common phenomenon in unconstrained handwritten Bangla documents. In this article, we propose a novel and efficient approach for text line segmentation where initially, we smudge the input document image, to blur-out white spaces between words, while preserving gaps between consecutive lines. Next, we obtain an initial segmentation scheme by shredding the image based on the white most pixels in between consecutive smudged lines. Multi-line connected components have been taken care of by thinning, and then finding the most probable point of separation in the component. Combining it with the initial segmentation, we obtain the final output. The proposed approach has been evaluated on ICDAR 2013 Handwriting Segmentation Contest dataset of Bangla. The segmentation results show the efficiency of the proposed approach.
  • Keywords
    document image processing; handwritten character recognition; image segmentation; text detection; ICDAR Handwriting Segmentation Contest dataset; ascender occurrence; component separation; consecutive smudged lines; descender occurrence; document recognition system; gap preservation; handwritten Bangla script; image pixels; image shredding; image thinning; input document image; multiline connected components; offline handwritten Bangla documents; text line segmentation approach; unconstrained handwritten Bangla document image; white space blurring-out; Estimation; Frequency modulation; Handwriting recognition; Image color analysis; Image segmentation; Junctions; Transforms; Handwritten Bangla document image; Handwritten document segmentation; Line segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on
  • Conference_Location
    Kolkata
  • Type

    conf

  • DOI
    10.1109/ICAPR.2015.7050679
  • Filename
    7050679