DocumentCode :
2162524
Title :
Automate discovery of deep web interfaces
Author :
Du, Xin ; Zheng, Yongqing ; Yan, Zhongmin
Author_Institution :
School of Computer Science and Technology, Shandong University, Jinan, China
fYear :
2010
fDate :
4-6 Dec. 2010
Firstpage :
3572
Lastpage :
3575
Abstract :
With the rapid increase of web sources, more and more deep web databases become available. The information in these databases can only be accessed by submitting queries to back-end databases. However, the traditional search engine interfaces resemble extremely deep web interfaces. Therefore, it is difficult to distinguish them and to find deep web interfaces. This paper proposes a novel method of discovering deep web interfaces. We introduce a page division method to divide pages into separate parts. After that we remove the parts which don´t contain search interfaces. At last we construct topic-specific queries to obtain results and distinguish deep web interfaces by analyzing the results. Experiment result shows that this method is effective and stable.
Keywords :
Accuracy; Crawlers; Databases; HTML; Layout; Web pages; Deep Web; Interface Extraction; Tag Trees;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
Type :
conf
DOI :
10.1109/ICISE.2010.5691802
Filename :
5691802
Link To Document :
بازگشت