DocumentCode
3140980
Title
Document image layout comparison and classification
Author
Hu, Jianying ; Kashi, Ramanujan ; Wilfong, Gordon
Author_Institution
Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
fYear
1999
fDate
20-22 Sep 1999
Firstpage
285
Lastpage
288
Abstract
The paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document retrieval as well as fast algorithms for initial document type classification without OCR. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors which can be used for fast page layout comparison. The paper describes experiments and results to rank-order a set of document pages in terms of their layout similarity to a test document. We also demonstrate the usefulness of the features derived from interval encoding in a hidden Markov model based page layout classification system that is trainable and extendible
Keywords
document image processing; encoding; hidden Markov models; image classification; information retrieval; HMM; document image classification; document image layout comparison; document pages; fast algorithms; fast page layout comparison; fixed-length vectors; hidden Markov model based page layout classification system; initial document type classification; interval encoding; layout similarity; region layout information; spatial layout; spatial layout level; test document; visual similarity based document retrieval; Data mining; Electronic switching systems; Image databases; Image retrieval; Information retrieval; Optical character recognition software; Shape measurement; Spatial databases; Spatial resolution; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location
Bangalore
Print_ISBN
0-7695-0318-7
Type
conf
DOI
10.1109/ICDAR.1999.791780
Filename
791780
Link To Document