DocumentCode :
1767011
Title :
Implementation of an extended Fellegi-Sunter probabilistic record linkage method using the Jaro-Winkler string comparator
Author :
Xinran Li ; Guttmann, Aline ; Cipiere, Sebastien ; Maigne, Lydia ; Demongeot, Jacques ; Boire, Jean-Yves ; Ouchchane, Lemlih
Author_Institution :
ISIT, Auvergne Univ., Clermont-Ferrand, France
fYear :
2014
fDate :
1-4 June 2014
Firstpage :
375
Lastpage :
379
Abstract :
Record linkage is the task of identifying which records from one or more data sources refer to the same person. Often, records do not have a common key and may contain typographical variations in identifier fields, in such a case, the Fellegi-Sunter probabilistic record linkage is a method commonly used. In this method, a weight is assigned for each pair of records. Record pairs with weights above a given threshold are considered as matches. Winkler introduced an extension of the Fellegi-Sunter method that takes into account field similarity in the calculation of weight, and proved its outperformance. The implementation of the Fellegi-Sunter method is frequently presented in the literature, however, the application of Winkler method is rarely mentioned. This paper presents brief backgrounds of these two record linkage methods, and describes in details how to implement the Winkler method. We formalized and then estimated the required parameters of the Winkler method using the expectation-maximization (EM) algorithm. Simulated data sets-with known truth of the matches-were used to assess parameters´ estimation and to compare Winkler and Fellegi-Sunter methods regarding their ability to reduce the rates of false matches and false non-matches.
Keywords :
data analysis; expectation-maximisation algorithm; medical computing; medical information systems; parameter estimation; EM; Fellegi-Sunter methods; Jaro-Winkler string comparator; Winkler method; common key; data sources; expectation-maximization algorithm; extended Fellegi-Sunter probabilistic record linkage method; false match rates; false nonmatches; field similarity; identifier fields; match truth; parameter estimation; record linkage methods; record pairs; simulated data sets; typographical variations; weight calculation; Accuracy; Computational modeling; Couplings; Databases; Educational institutions; Estimation; Probabilistic logic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Biomedical and Health Informatics (BHI), 2014 IEEE-EMBS International Conference on
Conference_Location :
Valencia
Type :
conf
DOI :
10.1109/BHI.2014.6864381
Filename :
6864381
Link To Document :
بازگشت