DocumentCode
3498131
Title
Distance measures for layout-based document image retrieval
Author
Van Beusekom, Joost ; Keysers, Daniel ; Shafait, Faisal ; Breuel, Thomas M.
Author_Institution
Image Understanding & Pattern Recognition Res. Group, Tech. Univ. of Kaiserslautern
fYear
2006
fDate
27-28 April 2006
Lastpage
242
Abstract
Most methods for document image retrieval rely solely on text information to find similar documents. This paper describes a way to use layout information for document image retrieval instead. A new class of distance measures is introduced for documents with Manhattan layouts, based on a two-step procedure: First, the distances between the blocks of two layouts are calculated. Then, the blocks of one layout are assigned to the blocks of the other layout in a matching step. Different block distances and matching methods are compared and evaluated using the publicly available MARG database. On this dataset, the layout type can be determined successfully in 92.6% of the cases using the best distance measure in a nearest neighbor classifier. The experiments show that the best distance measure for this task is the overlapping area combined with the Manhattan distance of the corner points as block distance together with the minimum weight edge cover matching
Keywords
document image processing; image matching; image retrieval; MARG database; Manhattan distance; Manhattan layouts; best distance measure; block distances; distance measures; image matching; layout information; layout-based document image retrieval; minimum weight edge cover matching; nearest neighbor classifier; Algorithm design and analysis; Area measurement; Artificial intelligence; Current measurement; Databases; Image retrieval; Information retrieval; Nearest neighbor searches; Optical character recognition software; Pattern recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on
Conference_Location
Lyon
Print_ISBN
0-7695-2531-8
Type
conf
DOI
10.1109/DIAL.2006.16
Filename
1612965
Link To Document