مرکز منطقه ای اطلاع رساني علوم و فناوري - The Convergence of Iterated Classification

DocumentCode :

3341473

Title :

The Convergence of Iterated Classification

Author :

An, Chang ; Baird, Henry S.

Author_Institution :

Comput. Sci. & Eng. Dept, Lehigh Univ., Bethlehem, PA

fYear :

2008

fDate :

16-19 Sept. 2008

Firstpage :

663

Lastpage :

670

Abstract :

We report an improved methodology for training a sequence of classifiers for document image content extraction, that is, the location and segmentation of regions containing handwriting, machine-printed text, photographs, blank space, etc. The resulting segmentation is pixel-accurate, and so accommodates a wide range of zone shapes (not merely rectangles). We have systematically explored the best scale (spatial extent) of features. We have found that the methodology is sensitive to ground-truthing policy, and especially to precision of ground-truth boundaries. Experiments on a diverse test set of 83 document images show that tighter ground-truth reduces per-pixel classification errors by 45% (from 38.9% to 21.4%). Strong evidence, from both experiments and simulation, suggests that iterated classification converges region boundaries to the ground-truth (i.e. they don´t drift). Experiments show that four-stage iterated classifiers reduce the error rates by 24%. We also present an analysis of special cases suggesting reasons why boundaries converge to the ground-truth.

Keywords :

document image processing; error statistics; image classification; image segmentation; iterative methods; document image content extraction; error rates; four-stage iterated classifiers; iterated classification; Computer science; Convergence; Drives; Error analysis; Feature extraction; Image analysis; Image converters; Image segmentation; Shape; Text analysis; content inventory; convergence; document content extraction; iterated classification; layout analysis; shape-oblivious segmentation; uniform content classification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on

Conference_Location :

Nara

Print_ISBN :

978-0-7695-3337-7

Type :

conf

DOI :

10.1109/DAS.2008.52

Filename :

4670019

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3341473