مرکز منطقه ای اطلاع رساني علوم و فناوري - Towards eliminating random I/O in hash joins

DocumentCode :

3374616

Title :

Towards eliminating random I/O in hash joins

Author :

Lo, Ming-Ling ; Ravishankar, Chinya V.

Author_Institution :

Dept. of Electr. Eng. & Comput. Sci., Michigan Univ., Ann Arbor, MI, USA

fYear :

1996

fDate :

26 Feb-1 Mar 1996

Firstpage :

422

Lastpage :

429

Abstract :

The widening performance gap between CPUs and disks is significant for hash join performance. Most current hash join methods try to reduce the volume of data transferred between memory and disk. In this paper, we try to reduce hash-join times by reducing random I/O. We study how current algorithms incur random I/O, and propose a new hash join method, Seq⁺, that converts much of the random I/O to sequential I/O. Seq⁺ uses a new organization for hash buckets on a disk, and larger input and output buffer sizes. We introduce the technique of batch writes to reduce the bucket-write cost, and the concepts of write- and read-groups of hash buckets to reduce the bucket-read cost. We derive a cost model for our method and present formulas for choosing various algorithm parameters, including input and output buffer sizes. Our performance study shows that the new hash join method performs many times better than current algorithms under various environments. Since our cost functions underestimate the cost of current algorithms and overestimate the cost of Seq⁺, the actual performance gain of Seq⁺ is likely to be even greater

Keywords :

buffer storage; file organisation; input-output programs; random processes; software performance evaluation; CPU performance; Seq⁺ hash join method; algorithm parameter selection; batch writes; bucket-read cost; bucket-write cost; buffer sizes; cost model; disk performance; hash bucket organization; hash join performance; random input/output elimination; read-groups; sequential I/O; write-groups; Costs; Geoscience; Spatial databases; Stability; Writing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering, 1996. Proceedings of the Twelfth International Conference on

Conference_Location :

New Orleans, LA

ISSN :

1063-6382

Print_ISBN :

0-8186-7240-4

Type :

conf

DOI :

10.1109/ICDE.1996.492191

Filename :

492191

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3374616