DocumentCode :
1397887
Title :
Performance analysis of three text-join algorithms
Author :
Meng, Weiyi ; Yu, Clement ; Wang, Wei ; Rishe, Naphtali
Author_Institution :
Dept. of Comput. Sci., State Univ. of New York, Binghamton, NY, USA
Volume :
10
Issue :
3
fYear :
1998
Firstpage :
477
Lastpage :
492
Abstract :
When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins-joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed
Keywords :
distributed databases; query processing; I/O costs; document collections; information retrieval systems; input document collections; inverted files; multidatabase system; text-join algorithms; textual database systems; Algorithm design and analysis; Analytical models; Computer Society; Costs; Data models; Database languages; Database systems; Information retrieval; Performance analysis; Relational databases;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/69.687979
Filename :
687979
Link To Document :
بازگشت