Title :
Detection and segmentation of lines and words in Gurmukhi handwritten text
Author :
Kumar, Rajiv ; Singh, Amardeep
Author_Institution :
SMCA, Thapar Univ., Patiala, India
Abstract :
The scanned text image is a non editable image though it has the text but one can not edit it or make any change, if required, to that scanned document. This provides a basis for the optical character recognition (OCR) theory. OCR is the process of recognizing a segmented part of the scanned image as a character. The overall OCR process consists of three major sub processes like pre processing, segmentation and then recognition. Out of these three, the segmentation process is the back bone of the overall OCR process. We can say that the segmentation process is the most significant process because if the segmentation is incorrect then we can not have the correct results; it is just like garbage in and garbage out. But it is not an easy job, because segmentation is one of the complex processes. It is more difficult if the document is handwritten because in that case only few points are there which can be used to make segmentation. In this paper, we formulate an approach to segment the scanned document image. As per this approach, initially this considers the whole image as one large window. Then this large window is broken into less large windows giving lines, once the lines are identified then each window consisting of a line is used to find a word present in that line and finally to characters. For that purpose we used the concept of variable sized window, that is, the window whose size can be adjusted according to needs. This concept was implemented and results were analyzed. After the analysis the same concept was modified and finally tried on different documents and we got good reasonable results.
Keywords :
handwritten character recognition; image segmentation; optical character recognition; Gurmukhi handwritten text; OCR recognition; OCR theory; image preprocessing; lines segmentation; optical character recognition theory; scanned document image segmentation; scanned text image; words segmentation; Application software; Banking; Bones; Character recognition; Computer vision; Flowcharts; Image recognition; Image segmentation; Office automation; Optical character recognition software; Characteristics; Flexible; Gurmukhi; Handwritten; OCR; Segmentation;
Conference_Titel :
Advance Computing Conference (IACC), 2010 IEEE 2nd International
Conference_Location :
Patiala
Print_ISBN :
978-1-4244-4790-9
Electronic_ISBN :
978-1-4244-4791-6
DOI :
10.1109/IADCC.2010.5422927