Title :
Extraction of relational schema from deep web sources: a form driven approach
Author :
Saissi, Yasser ; Zellou, Ahmed ; Idri, Ali
Author_Institution :
ENSIAS, Mohammed V Univ., Rabat, Morocco
Abstract :
The deep web is the biggest unexplored part of the web and we need to access directly to its entire data web sources without using any crawling or surfacing method. For this, we choose to use a virtual web integration system. However, the deep web virtual integration methods existing today, focuses only on the integration of the query interfaces giving access to the deep web. These query interfaces are integrated to build a global query interface able to query all the deep web sources. The objective of our work is to propose another vision of a deep web virtual integration system that uses a mediated schema built with a relational schema describing each deep web source. This paper proposes our approach to extract a relational schema describing a deep web source. The key idea underlying our approach is to analyze two structured information: the HTML Form and the HTML Table extracted from the deep web source to discover its data structure and to allow us to build a relational schema describing it. We use also a knowledge table to take profit of our learning experience on extracting relational schema from deep web source.
Keywords :
Internet; hypermedia markup languages; query processing; HTML form; HTML table; crawling method; data Web sources; data structure discovery; deep Web sources; deep Web virtual integration methods; form driven approach; global query interface; relational schema extraction; surfacing method; virtual Web integration system; Data integration; Data mining; Databases; Educational institutions; HTML; Search engines; Web pages; Deep web source; HTML form; Structured data; Web source integration;
Conference_Titel :
Complex Systems (WCCS), 2014 Second World Conference on
Print_ISBN :
978-1-4799-4648-8
DOI :
10.1109/ICoCS.2014.7060888