DocumentCode :
3594008
Title :
Near-wordless document structure classification
Author :
Summers, Kristen
Volume :
1
fYear :
1995
Firstpage :
462
Abstract :
Automatic derivation of logical document structure from generic layout would enable the development of many highly flexible electronic document manipulation tools. This problem can be divided into the segmentation of text into pieces and the classification of these pieces as particular logical structures. This paper proposes an approach to the classification of logical document structures, according to their distance from predefined prototypes. The prototypes consider linguistic information minimally, thus relying minimally on the accuracy of OCR and decreasing language-dependence. Different classes of logical structures and the differences in the requisite information for classifying them are discussed. A prototype format is proposed, existing prototypes and a distance measurement are described, and performance results are provided
Keywords :
document handling; document image processing; pattern recognition; OCR; document structure classification; electronic document manipulation tools; language-dependence; logical document structure; prototype format; segmentation of text; Adders; Distance measurement; Graphics; Marine vehicles; Optical character recognition software; Prototypes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Print_ISBN :
0-8186-7128-9
Type :
conf
DOI :
10.1109/ICDAR.1995.599036
Filename :
599036
Link To Document :
بازگشت