Title :
Binary Search Join between an IR System and an RDBMS
Author :
Wang, Ernest Dawei ; Luo, Qiong ; Yang, Dongqing ; Tang, Shiwei
Author_Institution :
Dept. of Comput. Sci., Peking Univ., Beijing
Abstract :
Integrating relational database technologies into Web information retrieval enables users to ask complex queries beyond traditional keyword searches over Web pages. One approach to this integration is to have a software layer on top of an information retrieval (IR) system and an RDBMS (relational database management system). A core operation in this top layer is to join the intermediate results from the two underlying systems (called the IR results and the DB results correspondingly) in order to produce the final ranked results for each query. Unfortunately, most conventional join algorithms are inefficient for this operation. In this paper, we propose one simple join algorithm called binary search join (BSJ) for the operation of joining the IR results and the DB results. This algorithm takes advantage of the fact that the IR results are already ranked by relevance and that the DB results are already sorted by the join attribute. It scans the IR results and for each IR result tuple performs a binary search over the DB results. We analytically and empirically study the performance of BSJ in comparison with several conventional join algorithms on a repository of Chinese news Web pages. The experiment results prove that BSJ works best in most cases
Keywords :
information retrieval; information retrieval systems; portals; relational databases; Chinese news Web page repository; RDBMS; Web information retrieval; Web page; binary search join; binary search join algorithm; information retrieval system; join attribute sorting; keyword search; relational database management system; relational database technology integration; relevance ranking; Algorithm design and analysis; Computer science; Information analysis; Information retrieval; Keyword search; Performance analysis; Portals; Relational databases; Sorting; Web pages;
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7