• DocumentCode
    1040427
  • Title

    A Method for Estimating the Precision of Placename Matching

  • Author

    Doerr, Martin ; Papagelis, Manos

  • Author_Institution
    Found. of Res. & TechnoL., Crete
  • Volume
    19
  • Issue
    8
  • fYear
    2007
  • Firstpage
    1089
  • Lastpage
    1101
  • Abstract
    Information in digital libraries and information systems frequently refers to locations or objects in geographic space. Digital gazetteers are commonly employed to match the referred placenames with actual locations in information integration and data cleaning procedures. This process may fail due to missing information in the gazetteer, multiple matches, or false positive matches. We have analyzed the cases of success and reasons for failure of the mapping process to a gazetteer. Based on these, we present a statistical model that permits estimating 1) the completeness of a gazetteer with respect to the specific target area and application, 2) the expected precision and recall of one-to-one mappings of source placenames to the gazetteer, 3) the semantic inconsistency that remains in one-to-one mappings, and 4) the degree to which the precision and recall are improved under knowledge of the identity of higher levels in a hierarchy of places. The presented model is based on statistical analysis of the mapping process of a large set of placenames itself and does not require any other background data. The statistical model assumes that a gazetteer is populated by a stochastic process. The paper discusses how future work could take deviations from this assumption into account. The method has been applied to a real case.
  • Keywords
    data mining; digital libraries; estimation theory; data cleaning; digital gazetteers; digital libraries; information integration; information systems; placename matching; precision estimation; semantic inconsistency; statistical analysis; statistical model; Cleaning; Control systems; Data engineering; Databases; Failure analysis; Information systems; Software libraries; Statistical analysis; Stochastic processes; Terminology; Data mapping; data translation.; database integration; knowledge and data engineering tools and techniques;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.1033
  • Filename
    4262538