DocumentCode :
262745
Title :
A tree-based WQI modeling approach for integrating Web databases
Author :
Marin-Castro, Heidy M. ; Sosa-Sosa, Victor J. ; Lopez-Arevalo, Ivan
Author_Institution :
Inf. Technol. Lab., Center of Res. & Adv. Studies of the Nat. Polytech. Inst., Ciudad Victoria, Mexico
fYear :
2014
fDate :
7-10 July 2014
Firstpage :
1
Lastpage :
8
Abstract :
Everyday, more and more specialized databases (car rental, hotels, airfares, etc.) are available on the Web and can be only queried by means of a Web Query Interface (WQI). Since in the Web is increasing the number of domain-specific databases, it is getting very complicated for end users to explore the information stored in them. In this context, research efforts are focused on building a single (unified) specific-domain WQI that allows user to query and integrate information available in different Web databases. The construction of such integrated WQI, for a given domain, involves several complex tasks, specially the extraction, representation, understanding and mapping of semantic content of each individual WQI associated to a web database. Previous approaches have considered hierarchical models to build integrated WQI, preserving the ancestor-descendant relationships in individual WQIs. In this work, we propose a novel tree-based approach for automatic construction of a hierarchical model of visual content of WQIs, representing their components in a clear and concise form. In the proposed approach, the Document Object Model(DOM) tree of each WQI considered in the integration process is processed by a specialized web resource to obtain relevant visual information in the WQI such as fields (UIs), groups of UIs and super-groups as well as their corresponding labels. This process is guided by a set of 8 design heuristic rules for the right identification of labels and components. Experiments to evaluate the proposed strategy were conducted on the ICQ and Tel-8 datasets of UIUC repository. Our results showed that the proposed tree-based approach for representing the visual components in a WQI has more than 94% of accuracy, improving current reported approaches and making easier the integration process of domain-specifi
Keywords :
database management systems; document handling; query processing; trees (mathematics); user interfaces; Document Object Model tree; ICQ dataset; Tel-8 dataset; UIUC repository; Web database integration; Web query interface; Web resource; ancestor-descendant relationships; domain-specific databases; heuristic rules; hierarchical models; semantic content extraction; semantic content mapping; semantic content representation; semantic content understanding; tree-based WQI modeling approach; Databases; Engines; HTML; Rendering (computer graphics); Semantics; Vectors; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Fusion (FUSION), 2014 17th International Conference on
Conference_Location :
Salamanca
Type :
conf
Filename :
6915979
Link To Document :
بازگشت