• DocumentCode
    3520856
  • Title

    A Holistic Solution for Duplicate Entity Identification in Deep Web Data Integration

  • Author

    Liu, Wei ; Meng, Xiaofeng

  • Author_Institution
    Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
  • fYear
    2010
  • fDate
    1-3 Nov. 2010
  • Firstpage
    267
  • Lastpage
    274
  • Abstract
    The proliferation of deep Web offers users a great opportunity to search high-quality information from Web. As a necessary step in deep Web data integration, the goal of duplicate entity identification is to discover the duplicate records from the integrated Web databases for further applications(e.g. price-comparison services). However, most of existing works address this issue only between two data sources, which are not practical to deep Web data integration systems. That is, one duplicate entity matcher trained over two specific Web databases cannot be applied to other Web databases. In addition, the cost of preparing the training set for n Web databases is C_n^2 times higher than that for two Web databases. In this paper, we propose a holistic solution to address the new challenges posed by deep Web, whose goal is to build one duplicate entity matcher over multiple Web databases. The extensive experiments on two domains show that the proposed solution is highly effective for deep Web data integration.
  • Keywords
    Internet; deep Web data integration; duplicate entity identification; integrated Web databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantics Knowledge and Grid (SKG), 2010 Sixth International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-8125-5
  • Electronic_ISBN
    978-0-7695-4189-1
  • Type

    conf

  • DOI
    10.1109/SKG.2010.38
  • Filename
    5663520