• DocumentCode
    2192018
  • Title

    Integrating HTML tables using semantic hierarchies and meta-data sets

  • Author

    Lim, Seung-Jin ; Ng, Yiu-Kai ; Yang, Xiaochun

  • Author_Institution
    Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    160
  • Lastpage
    169
  • Abstract
    As the Internet is a global network, there is a demand on accessing closely related data without browsing through different Web documents. A significant amount of these data are presented in HTML documents. Since data contents of HTML documents are intervened by markups, it is not trivial to integrate and provide a unified view of closely related data in different HTML documents. In this paper we present an approach for integrating semantically related data in any HTML tables that belong to a particular domain of interest (ID), such as house/apartment rental, by using the semantic hierarchies generated from the tables and the predefined meta-data sets that indicate related column names in ID. In our approach, we capture each data source as semi-structured data, called semantic hierarchy, and the end result of integrating different HTML tables of ID is a unified view of data in the tables, which is presented in an XML document. Besides HTML tables, our approach can be adopted by any system that integrates semi-structured data across different platforms.
  • Keywords
    Internet; hypermedia markup languages; information resources; information retrieval; meta data; HTML table integration; Internet; XML document; closely related data access; interest domain; meta data sets; related column names; semantic hierarchies; semistructured data; unified view data; Cities and towns; Computer science; Data analysis; Data warehouses; HTML; IP networks; Information analysis; Internet; Terminology; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Engineering and Applications Symposium, 2002. Proceedings. International
  • ISSN
    1098-8068
  • Print_ISBN
    0-7695-1638-6
  • Type

    conf

  • DOI
    10.1109/IDEAS.2002.1029668
  • Filename
    1029668