Title :
Parallel NoSQL Entity Resolution Approach with MapReduce
Author_Institution :
Shandong Provincial Key Lab. of Network Based Intell. Comput., Univ. of Jinan, Jinan, China
Abstract :
To address the limitation of entity resolution of NoSQL documents, we propose a new parallel NoSQL entity resolution approach with MapReduce. Although current MapReduce framework enables efficient parallel execution of entity resolution, it cannot find duplicates in adjacent block easily. Therefore, we investigate possible solutions called Partition-Sort-Map-Reduce to find duplicates by overlapping boundary objects in adjacent blocks. Finally, our experimental evaluation based on NoSQL breeding data and the analysis of time complexity show the high effectiveness and efficiency of the proposed entity resolution approaches.
Keywords :
"Sorting","Time complexity","Batch production systems","Parallel processing","Artificial intelligence","Tin"
Conference_Titel :
Intelligent Networking and Collaborative Systems (INCOS), 2015 International Conference on
DOI :
10.1109/INCoS.2015.16