DocumentCode :
3507770
Title :
Approximate Address Matching
Author :
Li, Dengyue ; Wang, Shengrui ; Mei, Zhen
Author_Institution :
Dept. of Comput. Sci., Univ. of Sherbrooke, Sherbrooke, QC, Canada
fYear :
2010
fDate :
4-6 Nov. 2010
Firstpage :
264
Lastpage :
269
Abstract :
Address management is a major challenge for many organizations, as errors occur frequently in the address capturing process, and address standards and usages may vary among different databases. Rather than comparing house number, street, city and province individually, we use a string similarity measurement to perform address comparison, which enables us to combine the edit distance with the vector space model to search for potentially matching address candidates by associating them with a similarity matching score. Upon evaluating the strengths and weaknesses of these techniques, we introduce an algorithm for effective address matching, called Term-Weighted Dissimilarity, which combines edit distance similarity with Term Frequency-Inverse Document Frequency weighting. We implement this algorithm in software and show its effectiveness via a real application for address matching and correction based on Canada Post´s address standard.
Keywords :
geographic information systems; information retrieval; string matching; text analysis; address capturing process; address comparison; address management; address matching; address standards; edit distance; string similarity measurement; term frequency-inverse document frequency weighting; term-weighted dissimilarity; vector space model; Address matching; TF-IDF weight; address correction; address standardization; edit distance; string similarity; vector space model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2010 International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4244-8538-3
Electronic_ISBN :
978-0-7695-4237-9
Type :
conf
DOI :
10.1109/3PGCIC.2010.43
Filename :
5662779
Link To Document :
بازگشت