DocumentCode
3147492
Title
A fast entity resolution method based on wave of records
Author
Liu, Yongnan ; Wang, Hongzhi ; Gao, Hong
Author_Institution
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
fYear
2011
fDate
16-18 April 2011
Firstpage
4642
Lastpage
4645
Abstract
Given a large data collection, entity resolution is to find the records referring to the same entity. A crucial step of entity resolution is to compute the similarity between records. Without careful design, sometimes it has to compare all characters in two records to get a small similarity value. In this paper, we propose a novel method based on waves of records, which is a sequence of frequencies of characters and the same frequency of different characters is considered as different. The structure Wave in our algorithm will decrease comparing times sharply in computing similarity by two techniques: filtering the record pairs without the similar waves, and estimating the maximum similarity of the remaining part of records can be, and if it is too small, the algorithm can end the computation as early as possible without false negative. We demonstrate the effectiveness of our algorithm using a thorough experimental evaluation over real-life data sets.
Keywords
data handling; data collection; entity resolution method; record pair filtering; record wave; wave structure; Algorithm design and analysis; Clustering algorithms; Complexity theory; Databases; Filtering algorithms; Heuristic algorithms; Nickel; entity resolution; signature generation; similarity computation;
fLanguage
English
Publisher
ieee
Conference_Titel
Consumer Electronics, Communications and Networks (CECNet), 2011 International Conference on
Conference_Location
XianNing
Print_ISBN
978-1-61284-458-9
Type
conf
DOI
10.1109/CECNET.2011.5768200
Filename
5768200
Link To Document