• DocumentCode
    139446
  • Title

    An empiric weight computation for record linkage using linearly combined fields´ similarity scores

  • Author

    Xinran Li ; Guttmann, Aline ; Demongeot, Jacques ; Boire, Jean-Yves ; Ouchchane, Lemlih

  • Author_Institution
    ISIT, Auvergne Univ., Clermont-Ferrand, France
  • fYear
    2014
  • fDate
    26-30 Aug. 2014
  • Firstpage
    1346
  • Lastpage
    1349
  • Abstract
    Record linkage is the task of identifying which records from one or more data sources refer to the same entity. Many record linkage methods were introduced and applied over the last decades. In general, the principle is to compare a range of available identifier fields in record pairs among different data sources, in order to make a linkage decision. The Fellegi-Sunter probabilistic record linkage (PRL-FS) is one of the most commonly used methods. To obtain a better performance, Winkler proposed an enhanced PRL-FS method (PRL-W) that takes into account field similarity, but its implementation requires the estimation of much more parameters which complicates the task. Consequently, we propose to develop a method that contains the best features in the PRL-FS and the PRL-W methods: simplicity of parameters estimation and consideration of fields´ similarities. We hypothesize that our record linkage method outperforms the PRL-FS, and can achieve a similar performance of the PRL-W. This paper presents briefly the PRL-FS and PRL-W methods, and describes in details how to combine fields´ similarity scores to create a novel record pair weight. Simulated data sets were used to assess and to compare these three methods regarding their ability to reduce the rates of false matches and false non-matches.
  • Keywords
    data analysis; medical information systems; parameter estimation; probability; Fellegi-Sunter probabilistic record linkage; PRL-W method; data sources; empiric weight computation; enhanced PRL-FS method; false match rate reduction; false nonmatch rate reduction; field similarity consideration; identifier field; linearly combined field similarity scores; linkage decision; parameter estimation simplicity; record linkage methods; record pair weight; simulated data sets; Computational modeling; Couplings; Data models; Educational institutions; Estimation; Probabilistic logic; Signal processing algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE
  • Conference_Location
    Chicago, IL
  • ISSN
    1557-170X
  • Type

    conf

  • DOI
    10.1109/EMBC.2014.6943848
  • Filename
    6943848