• Title of article

    Using intrinsic data skew to improve hash join performance

  • Author/Authors

    Bryce Cutt، نويسنده , , Ramon Lawrence، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2009
  • Pages
    18
  • From page
    493
  • To page
    510
  • Abstract
    Hash join is used to join large, unordered relations and operates independently of the data distributions of the join relations. Real-world data sets are not uniformly distributed and often contain significant skew. Although partition skew has been studied for hash joins, no prior work has examined how exploiting data skew can improve the performance of hash join. In this paper, we present histojoin, a join algorithm that uses histograms to identify data skew and improve join performance. Experimental results show that for skewed data sets histojoin performs significantly fewer I/O operations and is faster by 10–60% than hybrid hash join.
  • Keywords
    Hybrid hash join , skew , Histogram , Partition , Distribution
  • Journal title
    Information Systems
  • Serial Year
    2009
  • Journal title
    Information Systems
  • Record number

    1230103