• DocumentCode
    3143010
  • Title

    Large scale Hamming distance query processing

  • Author

    Liu, Alex X. ; Shen, Ke ; Torng, Eric

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    553
  • Lastpage
    564
  • Abstract
    Hamming distance has been widely used in many application domains, such as near-duplicate detection and pattern recognition. We study Hamming distance range query problems, where the goal is to find all strings in a database that are within a Hamming distance bound k from a query string. If k is fixed, we have a static Hamming distance range query problem. If k is part of the input, we have a dynamic Hamming distance range query problem. For the static problem, the prior art uses lots of memory due to its aggressive replication of the database. For the dynamic range query problem, as far as we know, there is no space and time efficient solution for arbitrary databases. In this paper, we first propose a static Hamming distance range query algorithm called HEngined, which addresses the space issue in prior art by dynamically expanding the query on the fly. We then propose a dynamic Hamming distance range query algorithm called HEngined, which addresses the limitation in prior art using a divide-and-conquer strategy. We implemented our algorithms and conducted side-by-side comparisons on large real-world and synthetic datasets. In our experiments, HEngines uses 4.65 times less space and processes queries 16% faster than the prior art, and HEngined processes queries 46 times faster than linear scan while using only 1.7 times more space.
  • Keywords
    divide and conquer methods; query processing; HEngined; HEngines; divide-and-conquer strategy; dynamic Hamming distance range query problem; large scale Hamming distance query processing; static Hamming distance range query problem; Approximation algorithms; Art; Hamming distance; Heuristic algorithms; High definition video; Query processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4244-8959-6
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2011.5767831
  • Filename
    5767831