DocumentCode
2491750
Title
Domain-oriented Deep Web Data Sources´ Discovery and Identification
Author
Li, Yingjun ; Nie, Tiezheng ; Shen, Derong ; Yu, Ge
Author_Institution
Inst. of Comput. Software & Theor., Northeastern Univ. Shenyang, Shenyang, China
fYear
2010
fDate
6-8 April 2010
Firstpage
464
Lastpage
467
Abstract
As Deep Web contains tremendous well-structured data sources, how to integrate data sources in Deep Web has become a hotspot in current research. Accurately discovering and identifying Deep Web data sources related to a specific domain become key issues. We propose a Domain-Oriented Deep Web data source Discovery method (DO-DWD) and a novel Domain Identification strategy of Deep Web data sources (DIDW). In the discovery stage, we use machine learning algorithms and some heuristic rules to find query interfaces of the data sources; In the identification stage, we identify Deep Web data sources associated with the domain by calculating the relevance between a query interface and the domain based on semantic similarity. Finally, we have extensive experiments on a real data set showing that DO-DWD and DIDW are of high correctness and accuracy.
Keywords
Internet; learning (artificial intelligence); query processing; user interfaces; data integration; data sources; domain identification strategy; domain-oriented deep Web data source discovery; machine learning algorithms; query interfaces; semantic similarity; Data engineering; Data mining; Databases; Educational institutions; Information science; Internet; Machine learning algorithms; Probes; Radio control; Software;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Conference (APWEB), 2010 12th International Asia-Pacific
Conference_Location
Busan
Print_ISBN
978-1-7695-4012-2
Electronic_ISBN
978-1-4244-6600-9
Type
conf
DOI
10.1109/APWeb.2010.54
Filename
5474088
Link To Document