DocumentCode :
2547961
Title :
LCA-Based Keyword Search for Effectively Retrieving "Information Unit" from Web Pages
Author :
Song, Xiaoming ; Feng, Jianhua ; Li, Guoliang ; Hong, Qin
Author_Institution :
Tsinghua Univ., Beijing
fYear :
2008
fDate :
20-22 July 2008
Firstpage :
31
Lastpage :
36
Abstract :
With the rapid development of the Internet technology, the structured data are more and more prevalent in the Internet. Moreover, most Web sites organize their data systematically and relevant data may be separated into different pages but linked through hyperlinks. However, the existing Web search engines cannot integrate information from multiple interrelated pages to answer keyword queries meaningfully. Next-generation web search engines require link-awareness, or more generally, the capability of integrating correlative information items that are linked through hyperlinks. In this paper, we study the problems of identifying the "information unit" of relevant pages containing all the input keywords as the answer. We model a set of most related Web pages as a tree, where the nodes in the tree are the web pages and the edges are the links between the Web pages. We retrieve the "information unit" of the most related and connected subtrees instead of single Web page as the answer. To improve the search efficiency, we propose an effective LCA-based algorithm to identify those subtrees which are most related to the given input keywords. We have conducted a set of extensive experiments on the proposed algorithm. The experimental results show that our method achieves high search performance and outperforms the existing alternative methods significantly.
Keywords :
Web sites; search engines; Internet technology; LCA-based keyword search; Web pages; Web search engines; Web sites; data structures; information unit; information unit retrieval; subtrees; Conference management; Information retrieval; Internet; Keyword search; Relational databases; Search engines; Tree data structures; Unsolicited electronic mail; Web pages; Web search; Information Unit; Keyword Search; LCA;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location :
Zhangjiajie Hunan
Print_ISBN :
978-0-7695-3185-4
Electronic_ISBN :
978-0-7695-3185-4
Type :
conf
DOI :
10.1109/WAIM.2008.15
Filename :
4596991
Link To Document :
بازگشت