Title :
Optimizing Nonindexed Join Processing in Flash Storage-Based Systems
Author :
Yu Li ; Sai Tung On ; Jianliang Xu ; Choi, Byron ; Haibo Hu
Author_Institution :
Dept. of Comput. Sci., Hong Kong Baptist Univ., Hong Kong, China
Abstract :
Flash memory-based disks (or simply flash disks) have been widely used in today´s computer systems. With their continuously increasing capacity and dropping price, it is envisioned that some database systems will operate on flash disks in the near future. However, the I/O characteristics of flash disks are different from those of magnetic hard disks. Motivated by this, we study the core of query processing in row-based database systems-join processing-on flash storage media. More specifically, we propose a new framework, called DigestJoin, to optimize nonindexed join processing by reducing the intermediate result size and exploiting fast random reads of flash disks. DigestJoin consists of two phases: 1) projecting the join attributes followed by a join on the projected attributes, and 2) fetching the full tuples that satisfy the join to produce the final join results. While the problem of tuple/page fetching with the minimum I/O cost (in the second phase) is intractable, we propose three heuristic page-fetching strategies for flash disks. We have implemented DigestJoin and conducted extensive experiments on a real flash disk. Our evaluation results based on TPC-H data sets show that DigestJoin clearly outperforms the traditional sort-merge join and hash join under a wide range of system configurations.
Keywords :
database indexing; flash memories; query processing; DigestJoin; TPC-H data sets; flash disks; flash memory-based disks; flash storage media; flash storage-based systems; hash join; heuristic page-fetching strategies; nonindexed join processing optimization; page fetching; query processing; row-based database systems; sort-merge join; tuple fetching; Ash; Central Processing Unit; Hard disks; Indexes; Schedules; Ash; Central Processing Unit; DigestJoin; Hard disks; Indexes; Query processing; Schedules; TPC-H data sets; database indexing; flash disks; flash memories; flash memory; flash memory-based disks; flash storage media; flash storage-based systems; hash join; heuristic page-fetching strategies; joins; nonindexed join processing optimization; page fetching; query processing; relational databases; row-based database systems; sort-merge join; tuple fetching;
Journal_Title :
Computers, IEEE Transactions on