• DocumentCode
    2195903
  • Title

    Document-Zone Classification in Torn Documents

  • Author

    Chanda, Sukalpa ; Franke, Katrin ; Pal, Umapada

  • Author_Institution
    Dept. of Comput. Sci. & Media Technol., Gjovik Univ. Coll., Gjovik, Norway
  • fYear
    2010
  • fDate
    16-18 Nov. 2010
  • Firstpage
    25
  • Lastpage
    30
  • Abstract
    Arbitrary orientation and sparse data content are common characteristics of torn document. To ensure accuracy and reliability in computer-based analysis, content-zone segmentation is required. In our previous work, we studied segmentation of handwritten and printed text. A questioned document-piece in the form of an office note, however, might also contain non-text data like logos, graphics, and pictures. Hence a more precise content-zone classification is required. In this paper we propose a two-tier approach for non-text, handwriting and printed text segmentation. The first tier aims to discriminate text and non-text regions. The second tier classifies handwritten and printed text within all text zones identified during the first tier. Gabor features and chain-code features are used in Tier-1 and Tier-2, respectively. By using SVM classifier we successfully identified 97.65% of 31,227 text regions in our current test data. The proposed approach identified 98.69% of printed and 96.39% of handwritten text amongst all identified text regions.
  • Keywords
    support vector machines; text analysis; Gabor features; SVM classifier; arbitrary orientation; chain-code features; computer-based analysis; content-zone segmentation; document-zone classification; handwritten segmentation; nontext segmentation; printed text segmentation; sparse data content; torn documents; two-tier approach; Printed and Handwritten Text Segmentation; Text Classification; Text Graphics Segmentation; Torn Document Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on
  • Conference_Location
    Kolkata
  • Print_ISBN
    978-1-4244-8353-2
  • Type

    conf

  • DOI
    10.1109/ICFHR.2010.12
  • Filename
    5693495