Title :
Document-Zone Classification in Torn Documents
Author :
Chanda, Sukalpa ; Franke, Katrin ; Pal, Umapada
Author_Institution :
Dept. of Comput. Sci. & Media Technol., Gjovik Univ. Coll., Gjovik, Norway
Abstract :
Arbitrary orientation and sparse data content are common characteristics of torn document. To ensure accuracy and reliability in computer-based analysis, content-zone segmentation is required. In our previous work, we studied segmentation of handwritten and printed text. A questioned document-piece in the form of an office note, however, might also contain non-text data like logos, graphics, and pictures. Hence a more precise content-zone classification is required. In this paper we propose a two-tier approach for non-text, handwriting and printed text segmentation. The first tier aims to discriminate text and non-text regions. The second tier classifies handwritten and printed text within all text zones identified during the first tier. Gabor features and chain-code features are used in Tier-1 and Tier-2, respectively. By using SVM classifier we successfully identified 97.65% of 31,227 text regions in our current test data. The proposed approach identified 98.69% of printed and 96.39% of handwritten text amongst all identified text regions.
Keywords :
support vector machines; text analysis; Gabor features; SVM classifier; arbitrary orientation; chain-code features; computer-based analysis; content-zone segmentation; document-zone classification; handwritten segmentation; nontext segmentation; printed text segmentation; sparse data content; torn documents; two-tier approach; Printed and Handwritten Text Segmentation; Text Classification; Text Graphics Segmentation; Torn Document Recognition;
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on
Conference_Location :
Kolkata
Print_ISBN :
978-1-4244-8353-2
DOI :
10.1109/ICFHR.2010.12