• DocumentCode
    2261131
  • Title

    Dynamically Constructing a Global Schema for Web Entities

  • Author

    Xu, Xiuxing ; Li, Qingzhong ; Dong, Yongquan ; Ding, Yanhui

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
  • fYear
    2010
  • fDate
    20-22 Aug. 2010
  • Firstpage
    127
  • Lastpage
    131
  • Abstract
    With the rapid development of the Internet, popular entities have more and more instances on the Web. It is observed that, on one hand, for the same Web entity, different Web entity instances often contain different attributes, and for the same attribute, different Web entity instances often use different labels; on the other, new Web entity instances which contain new attributes and labels are appearing on the Web. Therefore, it is difficult to dynamically construct a global schema for the Web entities of a given entity type, although the global schema is highly desired in Web entity instances detection, extraction and integration. In this paper, we propose a novel approach to dynamically construct a global schema for the Web entities of a given entity type. First, a SVM (support vector machine) classification model is built based on the Web entity instances which have been extracted from related Web pages. Then, based on this model, a global schema discovery approach is provided to dynamically construct the global schema for target entity type. Experimental results on the Chinese Web sites show that the approach is general and effective.
  • Keywords
    Internet; Web sites; data mining; support vector machines; Information extraction; Information integration; Internet; SVM; Web entity; Web pages; global schema; Classification algorithms; Construction industry; Data mining; Support vector machines; Training; Web pages; Global Schema; SVM; Web Entities; Web Information Integration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Systems and Applications Conference (WISA), 2010 7th
  • Conference_Location
    Hohhot
  • Print_ISBN
    978-1-4244-8440-9
  • Type

    conf

  • DOI
    10.1109/WISA.2010.32
  • Filename
    5581387