Title :
Truthing for Pixel-Accurate Segmentation
Author :
Moll, Michael A. ; Baird, Henry S. ; An, Chang
Author_Institution :
Comput. Sci. & Eng. Dept., Lehigh Univ., Bethlehem, PA
Abstract :
We discuss problems in developing policies for ground truthing document images for pixel-accurate segmentation. First, we describe ground truthing policies that apply to four different scales: (1) paragraph, (2) text line, (3) character, and (4) pixel. We then analyze difficult and/or ambiguous cases that will challenge any policy, e.g. blank space, overlapping content, etc. Experiments have shown the benefit of using "tighter\´\´ zones that capture more detail (e.g., at the text line level, instead of paragraph). We show that tighter ground truth does significantly improve classification results, by 45% in recent experiments. It is important to face the fact that a pixel-accurate segmentation can be better than manually obtained ground truth. In practice, perfectly accurate pixel-level ground truth may not be achievable of course, but we believe it is important to explore methods to semi-automatically improve existing ground truth.
Keywords :
document image processing; image classification; image segmentation; character level ground truthing policy; ground truthing document image; paragraph level ground truthing policy; pixel level ground truthing policy; pixel level image classification; pixel-accurate image segmentation; text line ground truthing policy; Computer science; Drives; Image analysis; Image databases; Image retrieval; Image segmentation; Pixel; Shape; Text analysis; Uniform resource locators; document content extraction; document content inventory; document content retrieval; document image analysis; ground-truthing; pixel accurate segmentation; zoning;
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
DOI :
10.1109/DAS.2008.47