Title :
Fringe Map Based Text Line Segmentation of Printed Telugu Document Images
Author :
Koppula, Vijaya Kumar ; Negi, Atul
Author_Institution :
Dept. of CSE, CMR Coll. of Eng. & Technol., Hyderabad, India
Abstract :
Text line segmentation is a crucial and important step which can greatly influence the accuracy of an OCR system. One of the major obstacles to building high-accuracy OCR systems for Indic scripts has been the text line segmentation problem. In particular for Telugu script this problem is still to be adequately addressed by research. The common methods of Roman script are not applicable due to the inherent script complexity of Telugu. Previous approaches to Telugu OCR in the literature take a simplified view of the problem, leading to errors in line segmentation. The problem is compounded in old documents that are typeset manually and have non-uniform print quality. In this work we propose a new method using the fringe map concept. In a fringe map each pixel of the binary image is associated with a fringe number that denotes the distance to the nearest black pixel. We use fringe value information to segment text lines. First we locate peak fringe numbers (PFNs). PFNs that are not between lines are filtered out. PFNs between adjacent lines are used to construct a region. The segmenting path between the adjacent lines is found by joining the filtered PFNs of a region.
Keywords :
document image processing; image segmentation; Indie scripts; OCR systems; Roman script; fringe map based text line segmentation; fringe value information; old documents; printed Telugu document images; Character recognition; Image recognition; Image segmentation; Merging; Optical character recognition software; Text recognition; White spaces; Fringe Maps; Indic scripts; Telugu OCR; Text line segmentation;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.260