DocumentCode
2162524
Title
Automate discovery of deep web interfaces
Author
Du, Xin ; Zheng, Yongqing ; Yan, Zhongmin
Author_Institution
School of Computer Science and Technology, Shandong University, Jinan, China
fYear
2010
fDate
4-6 Dec. 2010
Firstpage
3572
Lastpage
3575
Abstract
With the rapid increase of web sources, more and more deep web databases become available. The information in these databases can only be accessed by submitting queries to back-end databases. However, the traditional search engine interfaces resemble extremely deep web interfaces. Therefore, it is difficult to distinguish them and to find deep web interfaces. This paper proposes a novel method of discovering deep web interfaces. We introduce a page division method to divide pages into separate parts. After that we remove the parts which don´t contain search interfaces. At last we construct topic-specific queries to obtain results and distinguish deep web interfaces by analyzing the results. Experiment result shows that this method is effective and stable.
Keywords
Accuracy; Crawlers; Databases; HTML; Layout; Web pages; Deep Web; Interface Extraction; Tag Trees;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location
Hangzhou, China
Print_ISBN
978-1-4244-7616-9
Type
conf
DOI
10.1109/ICISE.2010.5691802
Filename
5691802
Link To Document