• DocumentCode
    787137
  • Title

    Optimizing main-memory join on modern hardware

  • Author

    Manegold, Stefan ; Boncz, Peter ; Kersten, Martin

  • Author_Institution
    CWI, Amsterdam, Netherlands
  • Volume
    14
  • Issue
    4
  • fYear
    2002
  • Firstpage
    709
  • Lastpage
    730
  • Abstract
    In the past decade, the exponential growth in commodity CPU\´s speed has far outpaced advances in memory latency. A second trend is that CPU performance advances are not only brought by increased clock rates, but also by increasing parallelism inside the CPU. Current database systems have not yet adapted to these trends and show poor utilization of both CPU and memory resources on current hardware. In this paper, we show how these resources can be optimized for large joins and translate these insights into guidelines for future database architectures, encompassing data structures, algorithms, cost modeling and implementation. In particular, we discuss how vertically fragmented data structures optimize cache performance on sequential data access. On the algorithmic side, we refine the partitioned hash-join with a new partitioning algorithm called "radix-cluster", which is specifically designed to optimize memory access. The performance of this algorithm is quantified using a detailed analytical model that incorporates memory access costs in terms of a limited number of parameters, such as cache sizes and miss penalties. We also present a calibration tool that extracts such parameters automatically from any computer hardware. The accuracy of our models is proven by exhaustive experiments conducted with the Monet database system on three different hardware platforms. Finally, we investigate the effect of implementation techniques that optimize CPU resource usage. Our experiments show that large joins can be accelerated almost an order of magnitude on modern RISC hardware when both memory and CPU resources are optimized
  • Keywords
    cache storage; data structures; database machines; optimisation; performance evaluation; query processing; CPU performance; CPU resource usage optimization; Monet database system; RISC hardware; accuracy; algorithm performance; algorithms; analytical model; automatic parameter extraction; cache performance optimization; cache sizes; calibration tool; clock rate; commodity CPU speed; cost modeling; database systems; decomposed storage model; future database architectures; hardware platforms; implementation techniques; main-memory databases; main-memory join optimization; memory access costs; memory access optimization; memory latency; miss penalties; modern hardware; parallelism; partitioned hash-join; query processing; radix-cluster partitioning algorithm; resource utilization; sequential data access; vertically fragmented data structures; Algorithm design and analysis; Clocks; Cost function; Data structures; Database systems; Delay; Guidelines; Hardware; Parallel processing; Partitioning algorithms;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2002.1019210
  • Filename
    1019210