Title :
Tape-disk join strategies under disk contention
Author :
Kraiss, Achim ; Muth, Peter ; Gillmann, Michael
Author_Institution :
Dept. of Comput. Sci., Saarlandes Univ., Saarbrucken, Germany
Abstract :
Large-scale data warehousing, data mining and scientific applications require the analysis of terabytes of factual data accumulated over long periods of time. Tape libraries are suitable for storing such mass data. The online analytical processing (OLAP) of this data typically leads to long-running aggregation queries joining the tape-resident fact relations with disk-resident dimension relations. During the join execution, the disks storing the dimension relations are often not dedicated to the join. They are subject to reads and writes invoked by concurrently running applications. In many cases, the performance of these concurrent applications should not be degraded too much by the processing of the join. We present an accurate model for analysing the performance of three different tape-disk join strategies in multi-query systems. The major contributions are: (a) a cost model considering tape and disk bandwidth, tape and disk latencies, available buffer sizes, CPU costs and the selectivity of filters on tape data; (b) disk queueing effects due to concurrent reads and writes at the disk; and (c) two disk scheduling strategies. We show the superiority of a disk scheduling strategy that gives preference to the servicing of the concurrent disk load. We present a strategy for dynamically selecting the most beneficial join algorithm and its parameters at run time. We have implemented the join strategies in a prototype system based on detailed simulations of secondary and tertiary storage devices. Our evaluations confirm that the model is very accurate and a suitable basis for run-time strategy decisions
Keywords :
concurrency control; data mining; data warehouses; magnetic disc storage; magnetic tape storage; query processing; scheduling; scientific information systems; software prototyping; storage management; CPU costs; aggregation queries; algorithm parameter selection; available buffer sizes; concurrent disk load servicing; concurrent reads; concurrent writes; concurrently running applications; cost model; data mining; disk bandwidth; disk contention; disk latency; disk queueing effects; disk scheduling strategies; disk-resident dimension relations; dynamic run-time join algorithm selection; factual data analysis; large-scale data warehousing; model accuracy; multi-query systems; online analytical processing; performance degradation; prototype system; run-time strategy decisions; scientific applications; secondary storage devices; simulations; tape bandwidth; tape data filter selectivity; tape latency; tape libraries; tape-disk join strategies; tape-resident fact relations; tertiary storage devices; Bandwidth; Costs; Data analysis; Data mining; Degradation; Delay; Large-scale systems; Libraries; Performance analysis; Warehousing;
Conference_Titel :
Data Engineering, 1999. Proceedings., 15th International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
0-7695-0071-4
DOI :
10.1109/ICDE.1999.754971