• DocumentCode
    3136603
  • Title

    A System for Handwritten and Machine-Printed Text Separation in Bangla Document Images

  • Author

    Banerjee, Prithu ; Chaudhuri, Bidyut B.

  • Author_Institution
    Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Kolkata, India
  • fYear
    2012
  • fDate
    18-20 Sept. 2012
  • Firstpage
    758
  • Lastpage
    762
  • Abstract
    In this paper, we describe an approach to distinguish between hand-written text and machine-printed text from annotated machine-printed Bangla Documents images. In applications involving OCR, distinction of machine-printed and hand-written characters is important, so that they can be sent to separate recognition engines. Identification of hand-written parts is useful in deleting those parts and cleaning the document image as well. In this paper a classification system is presented which takes a connected component in the document image and assigns them to two classes namely "machine-printed" and for "hand-written" classes, respectively. The proposed system contains a preprocessing step, which smoothes the object border and finds the Connected Component. Bangla script specific features are extracted from that Connected Component image, and a standard classifier based on SVM generates the final response. Experimental results on a data set show that the proposed approach achieves an overall accuracy of 96.49%.
  • Keywords
    document image processing; feature extraction; handwritten character recognition; image classification; natural language processing; optical character recognition; support vector machines; text detection; Bangla script specific feature extraction; OCR; SVM; annotated machine-printed Bangla document images; classification system; connected component image; hand-written characters; hand-written class; hand-written parts; handwritten text separation; machine-printed characters; machine-printed class; machine-printed text separation; object border; recognition engines; standard classifier; Accuracy; Feature extraction; Handwriting recognition; Support vector machines; Text recognition; Training; Bangla Script Recognition; Printed and Handwritten Text Separation; SVM Classifier;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
  • Conference_Location
    Bari
  • Print_ISBN
    978-1-4673-2262-1
  • Type

    conf

  • DOI
    10.1109/ICFHR.2012.171
  • Filename
    6424488