DocumentCode
2129906
Title
Improving join performance for skewed databases
Author
Cutt, Bryce ; Lawrence, Ramon
Author_Institution
Univ. of British Columbia Okanagan, Okanagan, BC
fYear
2008
fDate
4-7 May 2008
Abstract
The largest queries in data warehouses and decision support systems use hybrid hash join to relate information in multiple tables. Hybrid hash join functions independently of the data distributions of the join relations. Real-world data sets are not uniformly distributed and often contain significant skew. Although partition skew has been studied for hash joins, no prior work has examined how exploiting data skew can improve performance. In this paper, we present histo join, a join algorithm that uses histograms to identify data skew and improve join performance. Experimental results show that for skewed data sets histo join performs significantly fewer I/O operations and is faster by 20 to 60% than hybrid hash join.
Keywords
data warehouses; file organisation; query processing; data warehouse; decision support system; histojoin; hybrid hash join performance; query processing; skewed database; Cost benefit analysis; Cost function; Data warehouses; Database systems; Decision support systems; Frequency; Histograms; Partitioning algorithms; Performance analysis; Query processing; data warehouse; hash join; histogram; skew;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering, 2008. CCECE 2008. Canadian Conference on
Conference_Location
Niagara Falls, ON
ISSN
0840-7789
Print_ISBN
978-1-4244-1642-4
Electronic_ISBN
0840-7789
Type
conf
DOI
10.1109/CCECE.2008.4564563
Filename
4564563
Link To Document