DocumentCode
3136603
Title
A System for Handwritten and Machine-Printed Text Separation in Bangla Document Images
Author
Banerjee, Prithu ; Chaudhuri, Bidyut B.
Author_Institution
Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Kolkata, India
fYear
2012
fDate
18-20 Sept. 2012
Firstpage
758
Lastpage
762
Abstract
In this paper, we describe an approach to distinguish between hand-written text and machine-printed text from annotated machine-printed Bangla Documents images. In applications involving OCR, distinction of machine-printed and hand-written characters is important, so that they can be sent to separate recognition engines. Identification of hand-written parts is useful in deleting those parts and cleaning the document image as well. In this paper a classification system is presented which takes a connected component in the document image and assigns them to two classes namely "machine-printed" and for "hand-written" classes, respectively. The proposed system contains a preprocessing step, which smoothes the object border and finds the Connected Component. Bangla script specific features are extracted from that Connected Component image, and a standard classifier based on SVM generates the final response. Experimental results on a data set show that the proposed approach achieves an overall accuracy of 96.49%.
Keywords
document image processing; feature extraction; handwritten character recognition; image classification; natural language processing; optical character recognition; support vector machines; text detection; Bangla script specific feature extraction; OCR; SVM; annotated machine-printed Bangla document images; classification system; connected component image; hand-written characters; hand-written class; hand-written parts; handwritten text separation; machine-printed characters; machine-printed class; machine-printed text separation; object border; recognition engines; standard classifier; Accuracy; Feature extraction; Handwriting recognition; Support vector machines; Text recognition; Training; Bangla Script Recognition; Printed and Handwritten Text Separation; SVM Classifier;
fLanguage
English
Publisher
ieee
Conference_Titel
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
Conference_Location
Bari
Print_ISBN
978-1-4673-2262-1
Type
conf
DOI
10.1109/ICFHR.2012.171
Filename
6424488
Link To Document