Title :
A Survey on OCR for Overlapping and Broken Characters in Document Image: Problem with Overlapping and Broken Characters in Document Image
Author :
Gaur, Abhishek Kumar ; Bharangar, Devendra Singh ; Trivedi, Munesh Chand
Author_Institution :
CSE Dept., ABES Eng. Coll., Ghaziabad, India
Abstract :
OCR is the system that works in the domain of Natural Language Processing and Image Processing. This system is used to convert all the text information that is present in image form to text format. For OCR, identification of the text in printed, handwritten and degraded document images is a challenging task due to the high inter/intra-variation between the background and the foreground of document image. These degraded documents can be historical, secrete message, or anything that have some value attachment with it. So to find the text information becomes the most critical issue. Degradation of documents can be because of long time period, information hiding purpose, varying types of image noises etc. We have to face even more difficulty, when text present in the document images is degraded or overlapped in terms of some characters or text lines. To Segment the text presented at the word level, into characters becomes one of the important challenges in optical character recognition because of the presence of touching or broken characters. Touching or broken characters can´t be separated so easily from each other. This paper is focused on finding/applying an efficient method and also discusses some of the solutions based techniques for segmentation of touching characters in Indian Languages. This paper also has the proposed frame work to use these solutions to get maximum benefits. Proposed work of recognition of Overlapping Characters in Document Image is primarily for the Indian Languages.
Keywords :
data encapsulation; document image processing; natural language processing; optical character recognition; text analysis; Indian languages; OCR; broken characters; document image; image processing; information hiding; natural language processing; optical character recognition; overlapping characters; text information; text segmentation; Character recognition; Histograms; Image color analysis; Image segmentation; Optical character recognition software; Reservoirs; Degraded Image; Overlapping Characters; Segmentation; Textual Properties; and Zone;
Conference_Titel :
Computational Intelligence and Communication Networks (CICN), 2014 International Conference on
Print_ISBN :
978-1-4799-6928-9
DOI :
10.1109/CICN.2014.42