DocumentCode
3577869
Title
Extraction of relational schema from deep web sources: a form driven approach
Author
Saissi, Yasser ; Zellou, Ahmed ; Idri, Ali
Author_Institution
ENSIAS, Mohammed V Univ., Rabat, Morocco
fYear
2014
Firstpage
178
Lastpage
182
Abstract
The deep web is the biggest unexplored part of the web and we need to access directly to its entire data web sources without using any crawling or surfacing method. For this, we choose to use a virtual web integration system. However, the deep web virtual integration methods existing today, focuses only on the integration of the query interfaces giving access to the deep web. These query interfaces are integrated to build a global query interface able to query all the deep web sources. The objective of our work is to propose another vision of a deep web virtual integration system that uses a mediated schema built with a relational schema describing each deep web source. This paper proposes our approach to extract a relational schema describing a deep web source. The key idea underlying our approach is to analyze two structured information: the HTML Form and the HTML Table extracted from the deep web source to discover its data structure and to allow us to build a relational schema describing it. We use also a knowledge table to take profit of our learning experience on extracting relational schema from deep web source.
Keywords
Internet; hypermedia markup languages; query processing; HTML form; HTML table; crawling method; data Web sources; data structure discovery; deep Web sources; deep Web virtual integration methods; form driven approach; global query interface; relational schema extraction; surfacing method; virtual Web integration system; Data integration; Data mining; Databases; Educational institutions; HTML; Search engines; Web pages; Deep web source; HTML form; Structured data; Web source integration;
fLanguage
English
Publisher
ieee
Conference_Titel
Complex Systems (WCCS), 2014 Second World Conference on
Print_ISBN
978-1-4799-4648-8
Type
conf
DOI
10.1109/ICoCS.2014.7060888
Filename
7060888
Link To Document