• DocumentCode
    65685
  • Title

    Batch-Orthogonal Locality-Sensitive Hashing for Angular Similarity

  • Author

    Jianqiu Ji ; Shuicheng Yan ; Jianmin Li ; Guangyu Gao ; Qi Tian ; Bo Zhang

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    36
  • Issue
    10
  • fYear
    2014
  • fDate
    Oct. 1 2014
  • Firstpage
    1963
  • Lastpage
    1974
  • Abstract
    Sign-random-projection locality-sensitive hashing (SRP-LSH) is a widely used hashing method, which provides an unbiased estimate of pairwise angular similarity, yet may suffer from its large estimation variance. We propose in this work batch-orthogonal locality-sensitive hashing (BOLSH), as a significant improvement of SRP-LSH. Instead of independent random projections, BOLSH makes use of batch-orthogonalized random projections, i.e, we divide random projection vectors into several batches and orthogonalize the vectors in each batch respectively. These batch-orthogonalized random projections partition the data space into regular regions, and thus provide a more accurate estimator. We prove theoretically that BOLSH still provides an unbiased estimate of pairwise angular similarity, with a smaller variance for any angle in (0, π), compared with SRP-LSH. Furthermore, we give a lower bound on the reduction of variance. The extensive experiments on real data well validate that with the same length of binary code, BOLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, BOLSH shows the superiority in extensive approximate nearest neighbor (ANN) retrieval experiments.
  • Keywords
    cryptography; file organisation; mean square error methods; random processes; ANN retrieval; BOLSH; SRP-LSH; approximate nearest neighbor; batch-orthogonal locality-sensitive hashing; batch-orthogonalized random projection; mean squared error reduction; pairwise angular similarity; random projection vector; sign-random-projection; Binary codes; Educational institutions; Gaussian distribution; Hamming distance; Nearest neighbor searches; Probabilistic logic; Vectors; Sign-random-projection; angular similarity; approximate nearest neighbor search; locality-sensitive hashing;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2014.2315806
  • Filename
    6783789