DocumentCode
2725185
Title
A Dynamic Programming Algorithm for Name Matching
Author
Top, Philip ; Dowla, Farid ; Gansemer, Jim
Author_Institution
Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN
fYear
2007
fDate
March 1 2007-April 5 2007
Firstpage
547
Lastpage
551
Abstract
In many database and data mining applications concerning people, name matching plays a key role. Many algorithms to match names have been proposed. These algorithms must take into account spelling and transcription errors, name abbreviations, nicknames, out of order names, and missing or extra names. The existing algorithms typically fall along the lines of sound based, edit distance based, or token based algorithms which can use other methods in matching each part of the name separately. In this article, we propose a dynamic programming approach that includes a substring matching algorithm. The algorithm´s performance is compared against two often used algorithms by testing on a random sample of names from a database. The data used for the testing comes from the DHS US-VISIT Arrival and Departure Information System database, which includes names from all over the world. The performance on this data set was compared with that of the Damerau-Levenshtein algorithm and the Jaro-Winkler algorithm. The dynamic programming algorithm with substring matching performed better than both of these algorithms on the data tested
Keywords
data mining; database management systems; dynamic programming; string matching; DHS US-VISIT arrival-departure information system database; Damerau-Levenshtein algorithm; Jaro-Winkler algorithm; data mining; dynamic programming; name matching; substring matching algorithm; Application software; Computational intelligence; Costs; Data engineering; Data mining; Databases; Dynamic programming; Heuristic algorithms; Out of order; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location
Honolulu, HI
Print_ISBN
1-4244-0705-2
Type
conf
DOI
10.1109/CIDM.2007.368923
Filename
4221347
Link To Document