• DocumentCode
    594724
  • Title

    Detecting near-duplicate document images using interest point matching

  • Author

    Vitaladevuni, Shiv ; Choi, F. ; Prasad, Ranga ; Natarajan, Prem

  • Author_Institution
    Raytheon BBN Technol., Cambridge, MA, USA
  • fYear
    2012
  • fDate
    11-15 Nov. 2012
  • Firstpage
    347
  • Lastpage
    350
  • Abstract
    We present an approach to detecting near-duplicate document images using SIFT interest point matching. Given a set of document images, a database is constructed from the SIFT features extracted from each image, stored as a kd-tree. The near-duplicates of a query image are estimated by directly matching its SIFT descriptors with the feature database. We demonstrate the approach on a challenging set of unconstrained Arabic hand and machine written images obtained from the field, consisting of 16,000+ documents. Our experiments indicate that the approach detects near-duplicates with low false alarm rate and outperforms bag-of-words based approach.
  • Keywords
    document image processing; feature extraction; image matching; natural language processing; tree data structures; SIFT descriptors; SIFT feature extraction; SIFT interest point matching; bag-of-words-based approach; false alarm rate; feature database; kd-tree storage; machine written images; near-duplicate document image detection; query image estimation; unconstrained Arabic hand; Feature extraction; Image databases; Image segmentation; Imaging; Optical character recognition software; Shape;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2012 21st International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4673-2216-4
  • Type

    conf

  • Filename
    6460143