Robust Hashing With Local Models for Approximate Similarity Search

Author

Jingkuan Song ; Yi Yang ; Xuelong Li ; Zi Huang ; Yang Yang

Author_Institution

Sch. of Inf. Technol. & Electr. Eng., Univ. of Queensland, Brisbane, QLD, Australia

Volume

44

Issue

7

fYear

2014

fDate

Jul-14

Firstpage

1225

Lastpage

1236

Abstract

Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing ℓ_2,1-norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

Keywords

computational complexity; file organisation; query processing; RHLM; approximate similarity search; binary hash codes; database point; dimensionality curse; feature dimensionality; high-dimensional data; high-dimensional data point mapping; l_2,1-norm minimization; local hashing model; local structural information; loss function; optimal hash code; query data point; query hash code; real-life datasets; robust hash function learning; robust hashing-with-local models; search efficiency; search quality; training data points; training dataset; Databases; Laplace equations; Linear programming; Nickel; Robustness; Training; Training data; Approximate similarity search; indexing; robust hashing;

fLanguage

English

Journal_Title

Cybernetics, IEEE Transactions on

Publisher

ieee

ISSN

2168-2267

Type

jour

DOI

10.1109/TCYB.2013.2289351

Filename

6714849