Title :
Performance of the pipelined hash-join algorithm in a heterogeneous distributed environment
Author_Institution :
Dept. of Math. & Comput. Sci., Bloomsburg Univ., PA, USA
Abstract :
A pipelined distributed parallel hash-join algorithm is executed in a distributed heterogeneous supercomputing environment which consists of the Connection Machine CM2, and the Cray C90. This algorithm implements the computationally intensive join operation of relational databases. The hash and join phases of the algorithm are executed on the architectures determined to be best suited for them. The hash phase of the algorithm is implemented on the Cray C90. The hashed data sets of the first join relation are transmitted from the Cray to the CM2. A pipeline is established between the two machines as the Cray continues to hash each page of the second join relation and transmits it to the CM2 where the join is performed. Limited improvements in performance of the pipelined algorithm for different combinations of data sizes, data distributions, and join sizes is analyzed and the limitations of the distributed environment are discussed
Keywords :
file organisation; multiprocessing systems; parallel algorithms; pipeline processing; relational databases; software performance evaluation; CM2 Connection Machine; Cray C90; computationally intensive join operation; data distributions; data sizes; distributed heterogeneous supercomputing environment; hashed data sets; join sizes; pipelined distributed parallel hash-join algorithm performance; relational databases; Algorithm design and analysis; Computer architecture; Computer science; Database machines; Hardware; Mathematics; Partitioning algorithms; Performance analysis; Pipelines; Relational databases;
Conference_Titel :
Parallel and Distributed Processing, 1998. PDP '98. Proceedings of the Sixth Euromicro Workshop on
Conference_Location :
Madrid
Print_ISBN :
0-8186-8332-5
DOI :
10.1109/EMPDP.1998.647237