Title :
A Dynamic Programming Algorithm for Name Matching
Author :
Top, Philip ; Dowla, Farid ; Gansemer, Jim
Author_Institution :
Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN
fDate :
March 1 2007-April 5 2007
Abstract :
In many database and data mining applications concerning people, name matching plays a key role. Many algorithms to match names have been proposed. These algorithms must take into account spelling and transcription errors, name abbreviations, nicknames, out of order names, and missing or extra names. The existing algorithms typically fall along the lines of sound based, edit distance based, or token based algorithms which can use other methods in matching each part of the name separately. In this article, we propose a dynamic programming approach that includes a substring matching algorithm. The algorithm´s performance is compared against two often used algorithms by testing on a random sample of names from a database. The data used for the testing comes from the DHS US-VISIT Arrival and Departure Information System database, which includes names from all over the world. The performance on this data set was compared with that of the Damerau-Levenshtein algorithm and the Jaro-Winkler algorithm. The dynamic programming algorithm with substring matching performed better than both of these algorithms on the data tested
Keywords :
data mining; database management systems; dynamic programming; string matching; DHS US-VISIT arrival-departure information system database; Damerau-Levenshtein algorithm; Jaro-Winkler algorithm; data mining; dynamic programming; name matching; substring matching algorithm; Application software; Computational intelligence; Costs; Data engineering; Data mining; Databases; Dynamic programming; Heuristic algorithms; Out of order; Testing;
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
DOI :
10.1109/CIDM.2007.368923