DocumentCode :
2651484
Title :
Complete Coverage for Approximate String Matching in Record Linkage Using Bit Vectors
Author :
Schraagen, Marijn
Author_Institution :
Leiden Inst. of Adv. Comput. Sci., Leiden Univ., Leiden, Netherlands
fYear :
2011
fDate :
7-9 Nov. 2011
Firstpage :
740
Lastpage :
747
Abstract :
Research in social history is increasingly influenced by the availability of digitized sources. Tools have to be developed to access these sources in an efficient way. This paper describes a tool that performs family reconstruction using record linkage: linking historical civil certificates based on record similarity. Most current approaches in record linkage apply heuristics to limit the amount of similarity computations at the expense of linking coverage. The current paper describes a binary tree based indexing approach that provides complete coverage within practical time bounds. The indexing scheme is constructed using a simulated annealing algorithm to optimize indexing efficiency. A comparison to other methods using heuristics and complete coverage is provided. The method is developed for Levenshtein edit distance, however an extension to other similarity measures is feasible. As an example, extension to Jaro distance is discussed.
Keywords :
indexing; records management; simulated annealing; string matching; trees (mathematics); Levenshtein edit distance; approximate string matching; binary tree based indexing approach; bit vectors; record linkage; simulated annealing algorithm; social history; Couplings; Current measurement; Indexing; Pattern matching; Vectors; approximate matching; family reconstruction; record linkage; simulated annealing; tree indexing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location :
Boca Raton, FL
ISSN :
1082-3409
Print_ISBN :
978-1-4577-2068-0
Electronic_ISBN :
1082-3409
Type :
conf
DOI :
10.1109/ICTAI.2011.116
Filename :
6103407
Link To Document :
بازگشت