Title :
Predictive coding for document layout characterization
Author :
Sauvola, J. ; Pietikainen, Matti
Author_Institution :
Machine Vision & Media Process. Group, Oulu Univ.
Abstract :
We propose a new approach to document image layout extraction using rapid feature analysis, preclassification and predictive coding. First, a set of layout features is used to render the image profile information. The knowledge base is utilized to rule these early regions into layout labels. The regions found are given a classification tag and a degree of membership into background, text, picture and line drawing classes. A predictive coding method is used with the preclassification information to increase the confidence of each label, and to integrate the regional domain and the labels into a uniform class without any shape assumption. We have tested our technique using three different databases that comprise over 1000 document images. The results show a high degree of confidence in region separation and extraction. The main benefits include robust classification shape independency and rapid computation
Keywords :
document image processing; feature extraction; image classification; image coding; image segmentation; knowledge based systems; visual databases; background; classification tag; document image databases; document image layout extraction; document layout characterization; image profile information; image regions; image segmentation; knowledge base; layout features; line drawing; picture; preclassification; predictive coding; rapid feature analysis; region extraction; region separation; rule based reasoning; text; Data mining; Feature extraction; Image analysis; Image databases; Image retrieval; Image segmentation; Layout; Predictive coding; Shape; Text analysis;
Conference_Titel :
Document Image Analysis, 1997. (DIA '97) Proceedings., Workshop on
Conference_Location :
San Juan
Print_ISBN :
0-8186-8055-5
DOI :
10.1109/DIA.1997.627091