DocumentCode :
2008926
Title :
Similar pair identification using locality-sensitive hashing technique
Author :
Kyung Mi Lee ; Keon Myung Lee
Author_Institution :
Dept. of Comput. Sci. & PT-ERC, Chungbuk Nat. Univ., Cheongju, South Korea
fYear :
2012
fDate :
20-24 Nov. 2012
Firstpage :
2117
Lastpage :
2119
Abstract :
Huge volumes of data pose many opportunities and challenges in business and information societies. The similar pair identification problem happens in various fields such as image retrieval, near-duplicate document identification, plagiarism analysis, entity resolution, and so on. With the increasing number of items, it is not efficient to make pair-wise similarity comparisons. To handle this problem in an efficient way, various techniques have been developed. The locality-sensitive hashing is one of such techniques to avoid pair-wise comparisons in avoiding similar pairs. This paper introduces a modified method of the projection-based locality sensitive hashing technique. The proposed method reduces the chances that similar pairs fall into different buckets which is one of major drawbacks in the projection-based technique. We have observed that the proposed method outperforms the conventional projection-based method in that it gets better recall rate with some additional memory and computation costs.
Keywords :
cryptography; entity resolution; image retrieval; locality-sensitive hashing technique; near-duplicate document identification; pair-wise similarity; plagiarism analysis; projection-based locality sensitive hashing; similar pair identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on
Conference_Location :
Kobe
Print_ISBN :
978-1-4673-2742-8
Type :
conf
DOI :
10.1109/SCIS-ISIS.2012.6505385
Filename :
6505385
Link To Document :
بازگشت