• DocumentCode
    65137
  • Title

    Robust Hashing With Local Models for Approximate Similarity Search

  • Author

    Jingkuan Song ; Yi Yang ; Xuelong Li ; Zi Huang ; Yang Yang

  • Author_Institution
    Sch. of Inf. Technol. & Electr. Eng., Univ. of Queensland, Brisbane, QLD, Australia
  • Volume
    44
  • Issue
    7
  • fYear
    2014
  • fDate
    Jul-14
  • Firstpage
    1225
  • Lastpage
    1236
  • Abstract
    Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing ℓ2,1-norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.
  • Keywords
    computational complexity; file organisation; query processing; RHLM; approximate similarity search; binary hash codes; database point; dimensionality curse; feature dimensionality; high-dimensional data; high-dimensional data point mapping; l2,1-norm minimization; local hashing model; local structural information; loss function; optimal hash code; query data point; query hash code; real-life datasets; robust hash function learning; robust hashing-with-local models; search efficiency; search quality; training data points; training dataset; Databases; Laplace equations; Linear programming; Nickel; Robustness; Training; Training data; Approximate similarity search; indexing; robust hashing;
  • fLanguage
    English
  • Journal_Title
    Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2267
  • Type

    jour

  • DOI
    10.1109/TCYB.2013.2289351
  • Filename
    6714849