DocumentCode :
787137
Title :
Optimizing main-memory join on modern hardware
Author :
Manegold, Stefan ; Boncz, Peter ; Kersten, Martin
Author_Institution :
CWI, Amsterdam, Netherlands
Volume :
14
Issue :
4
fYear :
2002
Firstpage :
709
Lastpage :
730
Abstract :
In the past decade, the exponential growth in commodity CPU\´s speed has far outpaced advances in memory latency. A second trend is that CPU performance advances are not only brought by increased clock rates, but also by increasing parallelism inside the CPU. Current database systems have not yet adapted to these trends and show poor utilization of both CPU and memory resources on current hardware. In this paper, we show how these resources can be optimized for large joins and translate these insights into guidelines for future database architectures, encompassing data structures, algorithms, cost modeling and implementation. In particular, we discuss how vertically fragmented data structures optimize cache performance on sequential data access. On the algorithmic side, we refine the partitioned hash-join with a new partitioning algorithm called "radix-cluster", which is specifically designed to optimize memory access. The performance of this algorithm is quantified using a detailed analytical model that incorporates memory access costs in terms of a limited number of parameters, such as cache sizes and miss penalties. We also present a calibration tool that extracts such parameters automatically from any computer hardware. The accuracy of our models is proven by exhaustive experiments conducted with the Monet database system on three different hardware platforms. Finally, we investigate the effect of implementation techniques that optimize CPU resource usage. Our experiments show that large joins can be accelerated almost an order of magnitude on modern RISC hardware when both memory and CPU resources are optimized
Keywords :
cache storage; data structures; database machines; optimisation; performance evaluation; query processing; CPU performance; CPU resource usage optimization; Monet database system; RISC hardware; accuracy; algorithm performance; algorithms; analytical model; automatic parameter extraction; cache performance optimization; cache sizes; calibration tool; clock rate; commodity CPU speed; cost modeling; database systems; decomposed storage model; future database architectures; hardware platforms; implementation techniques; main-memory databases; main-memory join optimization; memory access costs; memory access optimization; memory latency; miss penalties; modern hardware; parallelism; partitioned hash-join; query processing; radix-cluster partitioning algorithm; resource utilization; sequential data access; vertically fragmented data structures; Algorithm design and analysis; Clocks; Cost function; Data structures; Database systems; Delay; Guidelines; Hardware; Parallel processing; Partitioning algorithms;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2002.1019210
Filename :
1019210
Link To Document :
بازگشت