Title :
Data-Dependent Hashing Based on p-Stable Distribution
Author :
Xiao Bai ; Haichuan Yang ; Jun Zhou ; Peng Ren ; Jian Cheng
Author_Institution :
Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing, China
Abstract :
The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this property, we develop a projection method, which maps the original data to arbitrary dimensional vectors. Each projection vector is a linear combination of multiple random vectors subject to p-stable distribution, in which the weights for the linear combination are learned based on the training data. An orthogonal matrix is then learned data-dependently for minimizing the thresholding error in quantization. Combining the projection method and orthogonal matrix, we develop an unsupervised hashing scheme, which preserves the Euclidean distance. Compared with data-independent hashing methods, our method takes the data distribution into consideration and gives more accurate hashing results with compact hash codes. Different from many data-dependent hashing methods, our method accommodates multiple hash tables and is not restricted by the number of hash functions. To extend our method to a supervised scenario, we incorporate a supervised label propagation scheme into the proposed projection method. This results in a supervised hashing scheme, which preserves semantic similarity of data. Experimental results show that our methods have outperformed several state-of-the-art hashing approaches in both effectiveness and efficiency.
Keywords :
cryptography; image processing; matrix algebra; Euclidean distance; arbitrary dimensional vectors; data dependent hashing; data distribution; linear combination; multiple random vectors; orthogonal matrix; p-stable distribution; projection method; supervised label propagation scheme; thresholding error; training data; unsupervised hashing scheme; variance estimation; Binary codes; Educational institutions; Euclidean distance; Quantization (signal); Semantics; Training; Vectors; Image retrieval; hash retrieval; p-stable distribution;
Journal_Title :
Image Processing, IEEE Transactions on
DOI :
10.1109/TIP.2014.2352458