Title :
An efficient line segmentation approach for handwritten Bangla document image
Author :
Mullick, K. ; Banerjee, S. ; Bhattacharya, U.
Author_Institution :
Dept. of Comput. Sci. & Eng., Heritage Inst. of Technol., Kolkata, India
Abstract :
Text line segmentation plays a vital role in the overall performance of a document recognition system. In the literature, similar segmentation works for offline handwritten Bangla documents are rarely found. On the other hand, certain peculiarities of handwritten Bangla script such as widespread occurrences of ascenders and descenders or some of its characters appearing only as an ascender or descender often cause unique difficulties to this segmentation task. Existence of connected components over a number of successive text lines is a common phenomenon in unconstrained handwritten Bangla documents. In this article, we propose a novel and efficient approach for text line segmentation where initially, we smudge the input document image, to blur-out white spaces between words, while preserving gaps between consecutive lines. Next, we obtain an initial segmentation scheme by shredding the image based on the white most pixels in between consecutive smudged lines. Multi-line connected components have been taken care of by thinning, and then finding the most probable point of separation in the component. Combining it with the initial segmentation, we obtain the final output. The proposed approach has been evaluated on ICDAR 2013 Handwriting Segmentation Contest dataset of Bangla. The segmentation results show the efficiency of the proposed approach.
Keywords :
document image processing; handwritten character recognition; image segmentation; text detection; ICDAR Handwriting Segmentation Contest dataset; ascender occurrence; component separation; consecutive smudged lines; descender occurrence; document recognition system; gap preservation; handwritten Bangla script; image pixels; image shredding; image thinning; input document image; multiline connected components; offline handwritten Bangla documents; text line segmentation approach; unconstrained handwritten Bangla document image; white space blurring-out; Estimation; Frequency modulation; Handwriting recognition; Image color analysis; Image segmentation; Junctions; Transforms; Handwritten Bangla document image; Handwritten document segmentation; Line segmentation;
Conference_Titel :
Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on
Conference_Location :
Kolkata
DOI :
10.1109/ICAPR.2015.7050679