• DocumentCode
    2078179
  • Title

    An Efficient Similarity Searching Scheme in Massive Databases

  • Author

    Shen, Haiying ; Li, Ting ; Schweiger, Tom

  • Author_Institution
    Dept. of Comput. Sci. & Comput. Eng., Univ. of Arkansas, Fayetteville, AR
  • fYear
    2008
  • fDate
    June 29 2008-July 5 2008
  • Firstpage
    47
  • Lastpage
    52
  • Abstract
    Locality sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high dimensional data. It is a popular technique for approximate nearest neighbor search. However, LSH needs large memory space and long processing time to achieve good performance when searching a massive dataset. In addition, it is not effective on locating similar data in a very high dimensional dataset. This paper proposes a new LSH-based similarity searching scheme, namely SMLSH. It intelligently combines a consistent hash function and min-wise independent permutations into LSH. SMLSH effectively classifies information according to the similarity with reduced memory space requirement and in a very efficient manner. It can quickly locate similar data in a massive dataset. Experiment results show that SMLSH is both time and space efficient in comparison with LSH. It yields significant improvements on the effectiveness of similar searching over LSH in a massive dataset.
  • Keywords
    data reduction; minimisation; probability; search problems; very large databases; LSH-based similarity searching; approximate nearest neighbor search; consistent hash function; efficient similarity searching; high dimensional data; locality sensitive hashing; massive databases; min-wise independent permutation; probabilistic dimension reduction; Computer science; Costs; Data engineering; Data structures; Databases; Delay; High performance computing; Nearest neighbor searches; Telecommunication computing; Tree data structures; Locality Sensitive Hashing (LSH); Min-Wise Independent Permutations; Similarity Search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Telecommunications, 2008. ICDT '08. The Third International Conference on
  • Conference_Location
    Bucharest
  • Print_ISBN
    978-0-7695-3188-5
  • Electronic_ISBN
    978-0-7695-3188-5
  • Type

    conf

  • DOI
    10.1109/ICDT.2008.12
  • Filename
    4561284