DocumentCode :
971193
Title :
Optimizing the execution of multiple data analysis queries on parallel and distributed environments
Author :
Andrade, Henrique ; Kurc, Tahsin ; Sussman, Alan ; Saltz, Joel
Author_Institution :
Dept. of Comput. Sci., Maryland Univ., College Park, MD, USA
Volume :
15
Issue :
6
fYear :
2004
fDate :
6/1/2004 12:00:00 AM
Firstpage :
520
Lastpage :
532
Abstract :
We investigate techniques for efficiently executing multiquery workloads from data and computation-intensive applications in parallel and/or distributed computing environments. In this context, we describe a database optimization framework that supports data and computation reuse, query scheduling, and active semantic caching to speed up the evaluation of multiquery workloads. Its most striking feature is the ability of optimizing the execution of queries in the presence of application-specific constructs by employing a customizable data and computation reuse model. Furthermore, we discuss how the proposed optimization model is flexible enough to work efficiently irrespective of the parallel/distributed environment underneath. In order to evaluate the proposed optimization techniques, we present experimental evidence using real data analysis applications. For this purpose, a common implementation for the queries under study was provided according to the database optimization framework and deployed on top of three distinct experimental configurations: a shared memory multiprocessor, a cluster of workstations, and a distributed computational Grid-like environment.
Keywords :
cache storage; data analysis; grid computing; multiprocessing systems; parallel databases; query processing; resource allocation; workstation clusters; active semantic caching; application-specific construct; cluster computing; computation reuse model; computation-intensive application; database optimization; distributed environments; grid computing; multiple data analysis query; multiquery workload; parallel database; query scheduling; shared memory multiprocessor; Computational modeling; Computer applications; Concurrent computing; Data analysis; Distributed computing; Distributed databases; Grid computing; Processor scheduling; Spatial databases; Workstations; 65; Multiquery optimization; cluster computing; data analysis applications; grid computing.; parallel databases; symmetric multiprocessing;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2004.11
Filename :
1291821
Link To Document :
بازگشت