Title :
Learning Image Anchor Templates for Document Classification and Data Extraction
Author_Institution :
Perceptual Document Anal. Area, Palo Alto Res. Center, Palo Alto, CA, USA
Abstract :
Image anchor templates are used in document image analysis for document classification, data localization, and other tasks. Current tools allow human operators to mark out small sub-images from documents to act as anchor templates. However, this requires time, and expertise because operators have to make informed decisions based on behavior of the template matching algorithms, and the expected degradations patterns in documents. We propose learning templates for a task automatically and quickly from a few training examples. Document classification or data localization can be done more robustly by combining evidence from many more discriminating templates (e.g., hundreds) than would be practicable for operators to specify.
Keywords :
document handling; image classification; image matching; information retrieval; object recognition; data extraction; data localization; document classification; document image analysis; image anchor templates learning; template matching algorithms; Degradation; Humans; NIST; Robustness; Text analysis; Training; Automatic Document Recognition; Data Extraction; Document Classification; Field Localization; Forms Processing; Image Anchor Template;
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7542-1
DOI :
10.1109/ICPR.2010.837