• DocumentCode
    3147492
  • Title

    A fast entity resolution method based on wave of records

  • Author

    Liu, Yongnan ; Wang, Hongzhi ; Gao, Hong

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • fYear
    2011
  • fDate
    16-18 April 2011
  • Firstpage
    4642
  • Lastpage
    4645
  • Abstract
    Given a large data collection, entity resolution is to find the records referring to the same entity. A crucial step of entity resolution is to compute the similarity between records. Without careful design, sometimes it has to compare all characters in two records to get a small similarity value. In this paper, we propose a novel method based on waves of records, which is a sequence of frequencies of characters and the same frequency of different characters is considered as different. The structure Wave in our algorithm will decrease comparing times sharply in computing similarity by two techniques: filtering the record pairs without the similar waves, and estimating the maximum similarity of the remaining part of records can be, and if it is too small, the algorithm can end the computation as early as possible without false negative. We demonstrate the effectiveness of our algorithm using a thorough experimental evaluation over real-life data sets.
  • Keywords
    data handling; data collection; entity resolution method; record pair filtering; record wave; wave structure; Algorithm design and analysis; Clustering algorithms; Complexity theory; Databases; Filtering algorithms; Heuristic algorithms; Nickel; entity resolution; signature generation; similarity computation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Consumer Electronics, Communications and Networks (CECNet), 2011 International Conference on
  • Conference_Location
    XianNing
  • Print_ISBN
    978-1-61284-458-9
  • Type

    conf

  • DOI
    10.1109/CECNET.2011.5768200
  • Filename
    5768200