DocumentCode :
3745683
Title :
Euclidean-Based Entity Resolution for Evolving Data
Author :
Chang Lu;Hongzhi Wang;Yan Zhang;Hong Gao
Author_Institution :
CS Dept., Harbin Inst. of Technol., Harbin, China
fYear :
2015
Firstpage :
1547
Lastpage :
1552
Abstract :
With large companies and corporations becoming increasingly responsible for data collection, in recent years, a growing number of scientists have proposed using a variety of algorithms and different theories to solve the database problem. Even though existing solutions are effective in many cases many, problems are left to solve during the integration of database. The entity resolution (ER) is a crucial problem to solve. ER has been used in many applications during the updating and loading process of the big data set, while the evolving data needs most. The evolving data set are currently used in the biology and computer information a lot, which contains microscope observation and biology information. Even though researchers have proposed different ER methods, the cost of ER problems is usually too large to accept. We use the high-dimensional space Euclidean vector to simulate the states of different entities in big data set. We combine this approach with the parallel improved Top-K algorithm, devising a way to more effectively detect the identity of the entity. Theoretical analysis and experimental results show that the proposed method could perform entity resolution on evolving data effectively and efficiently.
Keywords :
"Databases","Algorithm design and analysis","Erbium","Clustering algorithms","Euclidean distance","Computers","Computational modeling"
Publisher :
ieee
Conference_Titel :
Instrumentation and Measurement, Computer, Communication and Control (IMCCC), 2015 Fifth International Conference on
Type :
conf
DOI :
10.1109/IMCCC.2015.328
Filename :
7406109
Link To Document :
بازگشت