Title :
An empiric weight computation for record linkage using linearly combined fields´ similarity scores
Author :
Xinran Li ; Guttmann, Aline ; Demongeot, Jacques ; Boire, Jean-Yves ; Ouchchane, Lemlih
Author_Institution :
ISIT, Auvergne Univ., Clermont-Ferrand, France
Abstract :
Record linkage is the task of identifying which records from one or more data sources refer to the same entity. Many record linkage methods were introduced and applied over the last decades. In general, the principle is to compare a range of available identifier fields in record pairs among different data sources, in order to make a linkage decision. The Fellegi-Sunter probabilistic record linkage (PRL-FS) is one of the most commonly used methods. To obtain a better performance, Winkler proposed an enhanced PRL-FS method (PRL-W) that takes into account field similarity, but its implementation requires the estimation of much more parameters which complicates the task. Consequently, we propose to develop a method that contains the best features in the PRL-FS and the PRL-W methods: simplicity of parameters estimation and consideration of fields´ similarities. We hypothesize that our record linkage method outperforms the PRL-FS, and can achieve a similar performance of the PRL-W. This paper presents briefly the PRL-FS and PRL-W methods, and describes in details how to combine fields´ similarity scores to create a novel record pair weight. Simulated data sets were used to assess and to compare these three methods regarding their ability to reduce the rates of false matches and false non-matches.
Keywords :
data analysis; medical information systems; parameter estimation; probability; Fellegi-Sunter probabilistic record linkage; PRL-W method; data sources; empiric weight computation; enhanced PRL-FS method; false match rate reduction; false nonmatch rate reduction; field similarity consideration; identifier field; linearly combined field similarity scores; linkage decision; parameter estimation simplicity; record linkage methods; record pair weight; simulated data sets; Computational modeling; Couplings; Data models; Educational institutions; Estimation; Probabilistic logic; Signal processing algorithms;
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE
Conference_Location :
Chicago, IL
DOI :
10.1109/EMBC.2014.6943848