Title :
A Mutual-Information-Based Approach to Entity Reconciliation in Heterogeneous Databases
Author :
Bao-hua Qiang ; Xi, Jian-qing ; Bao-hua Qiang
Author_Institution :
Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
Abstract :
Entity reconciliation is crucial to data interoperability in heterogeneous databases. In our previous research works, we proposed an entities matching algorithm based on attribute entropy to identify the corresponding entities, which can resolve the limitations of present main approaches and improve the precision of entities matching obviously. By our further research, we find that some attributes with different importance in identifying the entities will obtain the same weights just according to attribute entropy. So in this paper we employ mutual information to quantify attribute weight due to mutual information well describes the correlation of probability distributions over two attributes. According to this idea, the final entropy computation algorithm and entity reconciliation algorithm based on mutual information are presented. The experimental results on real-world data show that our mutual-information-based approach can obtain better performance.
Keywords :
data handling; distributed databases; entropy; open systems; attribute entropy; data interoperability; entities matching algorithm; entity reconciliation; heterogeneous databases; mutual-information-based approach; Computer science; Data engineering; Databases; Distributed computing; Educational institutions; Entropy; Information science; Mutual information; Probability distribution; Software engineering; attribute entropy; entities matching; heterogeneous databases; mutual information;
Conference_Titel :
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3336-0
DOI :
10.1109/CSSE.2008.535