• DocumentCode
    3198221
  • Title

    An effective algorithm for parallelizing hash joins in the presence of data skew

  • Author

    Wolf, J.L. ; Dias, Douglas M. ; Yu, Philip S.

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Yorktown Heights, NY
  • fYear
    1991
  • fDate
    8-12 Apr 1991
  • Firstpage
    200
  • Lastpage
    209
  • Abstract
    A parallel hash join algorithm based on the concept of hierarchical hashing is proposed to address the problem data skew. The proposed algorithm adds an extra scheduling phase to the usual hash and join phases. During the scheduling phase, a heuristic optimization algorithm, using the output of the hash phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the hash partitions with the largest skew elements, splits them up, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution of the elements in the join column, the algorithm is shown to achieve good load balancing for the join phase in a CPU-bound environment, and it is shown to be fairly robust relative to the degree of data skew and the total number of processors. The overall speedup due to this algorithm is compared with conventional parallel hash join methods and the considerable advantage of the proposed method is demonstrated even when the additional scheduling phase is taken into account
  • Keywords
    file organisation; heuristic programming; optimisation; scheduling; Zipf-like distribution; algorithm; data skew; heuristic optimization algorithm; hierarchical hashing; load balancing; parallelizing hash joins; scheduling phase; Database systems; Heuristic algorithms; Load management; Partitioning algorithms; Processor scheduling; Proposals; Prototypes; Relational databases; Robustness; Scheduling algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 1991. Proceedings. Seventh International Conference on
  • Conference_Location
    Kobe
  • Print_ISBN
    0-8186-2138-9
  • Type

    conf

  • DOI
    10.1109/ICDE.1991.131467
  • Filename
    131467