Title :
Integrating HTML tables using semantic hierarchies and meta-data sets
Author :
Lim, Seung-Jin ; Ng, Yiu-Kai ; Yang, Xiaochun
Author_Institution :
Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
Abstract :
As the Internet is a global network, there is a demand on accessing closely related data without browsing through different Web documents. A significant amount of these data are presented in HTML documents. Since data contents of HTML documents are intervened by markups, it is not trivial to integrate and provide a unified view of closely related data in different HTML documents. In this paper we present an approach for integrating semantically related data in any HTML tables that belong to a particular domain of interest (ID), such as house/apartment rental, by using the semantic hierarchies generated from the tables and the predefined meta-data sets that indicate related column names in ID. In our approach, we capture each data source as semi-structured data, called semantic hierarchy, and the end result of integrating different HTML tables of ID is a unified view of data in the tables, which is presented in an XML document. Besides HTML tables, our approach can be adopted by any system that integrates semi-structured data across different platforms.
Keywords :
Internet; hypermedia markup languages; information resources; information retrieval; meta data; HTML table integration; Internet; XML document; closely related data access; interest domain; meta data sets; related column names; semantic hierarchies; semistructured data; unified view data; Cities and towns; Computer science; Data analysis; Data warehouses; HTML; IP networks; Information analysis; Internet; Terminology; XML;
Conference_Titel :
Database Engineering and Applications Symposium, 2002. Proceedings. International
Print_ISBN :
0-7695-1638-6
DOI :
10.1109/IDEAS.2002.1029668