• DocumentCode
    2233506
  • Title

    An Approach of Standardization and Searching based on Hierarchical Bayesian Clustering (HBC) for Record Linkage System

  • Author

    Tun, Zin War ; Thein, Nilar

  • Author_Institution
    Univ. of Computer Studies, Yangon
  • fYear
    2007
  • fDate
    24-26 Jan. 2007
  • Firstpage
    54
  • Lastpage
    60
  • Abstract
    Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any system must identify common entities from these sources. Therefore, the major challenges in record linkage are computational complexity and linkage accuracy. To reduce the number of record pairs for comparison, record linkage utilizes similarity search techniques in order to search for candidate similar records. Various searching methods have been used in record linkage systems. In this paper, we propose a record linkage framework and also focus on standardization and enhance the searching method by adopting an advanced feature of cluster-based searching method called Hierarchical Bayesian Clustering (HBC), which is not only for more efficient record pair comparison, but also for speeding up the record linkage accuracy. The purpose of this method is to place similar records into cluster that restricts the search scope for record comparison and also enhances matching accuracy.
  • Keywords
    Bayes methods; Internet; computational complexity; pattern clustering; standardisation; World Wide Web; computational complexity; data sources; hierarchical Bayesian clustering; information sources; linkage accuracy; record linkage system; searching methods; standardization; Bayesian methods; Cleaning; Couplings; Data mining; Databases; Indexing; Information retrieval; Machine learning; Military computing; Standardization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Creating, Connecting and Collaborating through Computing, 2007. C5 '07. The Fifth International Conference on
  • Conference_Location
    Kyoto
  • Print_ISBN
    0-7695-2806-6
  • Type

    conf

  • DOI
    10.1109/C5.2007.5
  • Filename
    4144934