DocumentCode
594724
Title
Detecting near-duplicate document images using interest point matching
Author
Vitaladevuni, Shiv ; Choi, F. ; Prasad, Ranga ; Natarajan, Prem
Author_Institution
Raytheon BBN Technol., Cambridge, MA, USA
fYear
2012
fDate
11-15 Nov. 2012
Firstpage
347
Lastpage
350
Abstract
We present an approach to detecting near-duplicate document images using SIFT interest point matching. Given a set of document images, a database is constructed from the SIFT features extracted from each image, stored as a kd-tree. The near-duplicates of a query image are estimated by directly matching its SIFT descriptors with the feature database. We demonstrate the approach on a challenging set of unconstrained Arabic hand and machine written images obtained from the field, consisting of 16,000+ documents. Our experiments indicate that the approach detects near-duplicates with low false alarm rate and outperforms bag-of-words based approach.
Keywords
document image processing; feature extraction; image matching; natural language processing; tree data structures; SIFT descriptors; SIFT feature extraction; SIFT interest point matching; bag-of-words-based approach; false alarm rate; feature database; kd-tree storage; machine written images; near-duplicate document image detection; query image estimation; unconstrained Arabic hand; Feature extraction; Image databases; Image segmentation; Imaging; Optical character recognition software; Shape;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location
Tsukuba
ISSN
1051-4651
Print_ISBN
978-1-4673-2216-4
Type
conf
Filename
6460143
Link To Document