• DocumentCode
    3507770
  • Title

    Approximate Address Matching

  • Author

    Li, Dengyue ; Wang, Shengrui ; Mei, Zhen

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sherbrooke, Sherbrooke, QC, Canada
  • fYear
    2010
  • fDate
    4-6 Nov. 2010
  • Firstpage
    264
  • Lastpage
    269
  • Abstract
    Address management is a major challenge for many organizations, as errors occur frequently in the address capturing process, and address standards and usages may vary among different databases. Rather than comparing house number, street, city and province individually, we use a string similarity measurement to perform address comparison, which enables us to combine the edit distance with the vector space model to search for potentially matching address candidates by associating them with a similarity matching score. Upon evaluating the strengths and weaknesses of these techniques, we introduce an algorithm for effective address matching, called Term-Weighted Dissimilarity, which combines edit distance similarity with Term Frequency-Inverse Document Frequency weighting. We implement this algorithm in software and show its effectiveness via a real application for address matching and correction based on Canada Post´s address standard.
  • Keywords
    geographic information systems; information retrieval; string matching; text analysis; address capturing process; address comparison; address management; address matching; address standards; edit distance; string similarity measurement; term frequency-inverse document frequency weighting; term-weighted dissimilarity; vector space model; Address matching; TF-IDF weight; address correction; address standardization; edit distance; string similarity; vector space model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2010 International Conference on
  • Conference_Location
    Fukuoka
  • Print_ISBN
    978-1-4244-8538-3
  • Electronic_ISBN
    978-0-7695-4237-9
  • Type

    conf

  • DOI
    10.1109/3PGCIC.2010.43
  • Filename
    5662779