DocumentCode :
1634774
Title :
Classifying Foreground Pixels in Document Images
Author :
Sarkar, Prateek ; Saund, Eric ; Lin, Jing
Author_Institution :
Perceptual Document Anal., Palo Alto Res. Center, Palo Alto, CA, USA
fYear :
2009
Firstpage :
641
Lastpage :
645
Abstract :
We present a system that classifies pixels in a document image according to marking type such as machine print,handwriting, and noise. A segmenter module first splits an input image into fragments, sometimes breaking connected components. Each fragment is then classified by an automatically trained multi-stage classifier that is fast and considers features of the fragment, as well as its neighborhood. Features relevant for discrimination are picked out automatically from among hundreds of measurements. Our system is trainable from example images in which each foreground pixel has a ldquoground-truthrdquo label. The main distinction of our system is the level of accuracy achieved in classifying fragments at sub-connected component level, rather than larger aggregate groups such as words or text-lines.We have trained this system to detect handwriting, machine print text, machine print graphics, and noise.
Keywords :
document image processing; handwriting recognition; image classification; image segmentation; document image; foreground pixel classification; ground-truth label; machine print; multistage classifier; segmenter module; subconnected component level; Aggregates; Graphics; Handwriting recognition; Humans; Image analysis; Image recognition; Image segmentation; Pixel; Text analysis; Writing; context based classification; handwriting detection; mark classification; pixel classifier;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
ISSN :
1520-5363
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2009.252
Filename :
5277566
Link To Document :
بازگشت