DocumentCode :
594724
Title :
Detecting near-duplicate document images using interest point matching
Author :
Vitaladevuni, Shiv ; Choi, F. ; Prasad, Ranga ; Natarajan, Prem
Author_Institution :
Raytheon BBN Technol., Cambridge, MA, USA
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
347
Lastpage :
350
Abstract :
We present an approach to detecting near-duplicate document images using SIFT interest point matching. Given a set of document images, a database is constructed from the SIFT features extracted from each image, stored as a kd-tree. The near-duplicates of a query image are estimated by directly matching its SIFT descriptors with the feature database. We demonstrate the approach on a challenging set of unconstrained Arabic hand and machine written images obtained from the field, consisting of 16,000+ documents. Our experiments indicate that the approach detects near-duplicates with low false alarm rate and outperforms bag-of-words based approach.
Keywords :
document image processing; feature extraction; image matching; natural language processing; tree data structures; SIFT descriptors; SIFT feature extraction; SIFT interest point matching; bag-of-words-based approach; false alarm rate; feature database; kd-tree storage; machine written images; near-duplicate document image detection; query image estimation; unconstrained Arabic hand; Feature extraction; Image databases; Image segmentation; Imaging; Optical character recognition software; Shape;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460143
Link To Document :
بازگشت