Title :
Document classification using layout analysis
Author :
Hu, Jianying ; Kashi, Ramanujan ; Wilfong, Gordon
Author_Institution :
Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
Abstract :
This paper describes methods for document image classification at the spatial layout level. The goal is to develop fast algorithms for initial document type classification without OCR, which can then be verified using more elaborate methods based on more detailed geometric and syntactic models. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. We demonstrate the usefulness of these features derived from interval coding, in a hidden Markov model based page layout classification system that is trainable and extendible
Keywords :
document image processing; encoding; hidden Markov models; image classification; document classification; document image classification; hidden Markov model; interval coding; interval encoding; layout analysis; region layout information; spatial layout level; Content based retrieval; Data mining; Electrical capacitance tomography; Feature extraction; Image classification; Image segmentation; Optical character recognition software; Read only memory; Routing; Shape measurement;
Conference_Titel :
Database and Expert Systems Applications, 1999. Proceedings. Tenth International Workshop on
Conference_Location :
Florence
Print_ISBN :
0-7695-0281-4
DOI :
10.1109/DEXA.1999.795245