DocumentCode
3190084
Title
Document classification using layout analysis
Author
Hu, Jianying ; Kashi, Ramanujan ; Wilfong, Gordon
Author_Institution
Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
fYear
1999
fDate
1999
Firstpage
556
Lastpage
560
Abstract
This paper describes methods for document image classification at the spatial layout level. The goal is to develop fast algorithms for initial document type classification without OCR, which can then be verified using more elaborate methods based on more detailed geometric and syntactic models. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. We demonstrate the usefulness of these features derived from interval coding, in a hidden Markov model based page layout classification system that is trainable and extendible
Keywords
document image processing; encoding; hidden Markov models; image classification; document classification; document image classification; hidden Markov model; interval coding; interval encoding; layout analysis; region layout information; spatial layout level; Content based retrieval; Data mining; Electrical capacitance tomography; Feature extraction; Image classification; Image segmentation; Optical character recognition software; Read only memory; Routing; Shape measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications, 1999. Proceedings. Tenth International Workshop on
Conference_Location
Florence
Print_ISBN
0-7695-0281-4
Type
conf
DOI
10.1109/DEXA.1999.795245
Filename
795245
Link To Document