Title :
Parallel implementations of exclusion joins
Author_Institution :
Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., Kowloon, Hong Kong
Abstract :
This paper examines the parallel processing of exclusion join in a shared-nothing multiprocessor environment. First, a parallel hash-based exclusion join algorithm is presented. Unlike the case of equijoin, this algorithm does not work correctly in the presence of nulls in the join attributes. One solution is to restrict the hash-on attributes to non-nullable fields. However, this can lead to the well known data skew problem. If the number of tuples containing null values in their join attributes is small, an alternative is to replicate those tuples to all processors. Otherwise, we can consider a range partitioning algorithm where those tuples are only sent to a small subset of the processors. The hash-based algorithm usually outperforms the range partitioning algorithm except when the number of tuples containing null values in their join attributes is large or when the data is highly skewed
Keywords :
database theory; distributed databases; parallel algorithms; query processing; data skew problem; exclusion join; hash-based algorithm; hash-on attributes; parallel hash-based exclusion join algorithm; parallel processing; range partitioning algorithm; shared-nothing multiprocessor environment; Computer science; Database machines; Joining IEEE; Parallel processing; Partitioning algorithms;
Conference_Titel :
Parallel and Distributed Processing, 1993. Proceedings of the Fifth IEEE Symposium on
Conference_Location :
Dallas, TX
Print_ISBN :
0-8186-4222-X
DOI :
10.1109/SPDP.1993.395458