DocumentCode :
3190084
Title :
Document classification using layout analysis
Author :
Hu, Jianying ; Kashi, Ramanujan ; Wilfong, Gordon
Author_Institution :
Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
fYear :
1999
fDate :
1999
Firstpage :
556
Lastpage :
560
Abstract :
This paper describes methods for document image classification at the spatial layout level. The goal is to develop fast algorithms for initial document type classification without OCR, which can then be verified using more elaborate methods based on more detailed geometric and syntactic models. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. We demonstrate the usefulness of these features derived from interval coding, in a hidden Markov model based page layout classification system that is trainable and extendible
Keywords :
document image processing; encoding; hidden Markov models; image classification; document classification; document image classification; hidden Markov model; interval coding; interval encoding; layout analysis; region layout information; spatial layout level; Content based retrieval; Data mining; Electrical capacitance tomography; Feature extraction; Image classification; Image segmentation; Optical character recognition software; Read only memory; Routing; Shape measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications, 1999. Proceedings. Tenth International Workshop on
Conference_Location :
Florence
Print_ISBN :
0-7695-0281-4
Type :
conf
DOI :
10.1109/DEXA.1999.795245
Filename :
795245
Link To Document :
بازگشت