DocumentCode
2787555
Title
A Performance Prediction Framework for Grid-Based Data Mining Applications
Author
Glimcher, Leonid ; Agrawal, Gagan
Author_Institution
Department of Computer Science and Engineering, Ohio State University, Columbus OH 43210. glimcher@cse.ohio-state.edu
fYear
2007
fDate
26-30 March 2007
Firstpage
1
Lastpage
10
Abstract
For a grid middleware to perform resource allocation, prediction models are needed, which can determine how long an application will take for completion on a particular platform or configuration. In this paper, we take the approach that by focusing on the characteristics of the class of applications a middleware is suited for, we can develop simple performance models that can be very accurate in practice. The particular middleware we consider is FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid), which supports a high-level interface for developing data mining and scientific data processing applications that involve data stored in remote repositories. The FREERIDE-G system needs detailed performance models for performing resource selection, i.e., choosing computing nodes and replica of the dataset. This paper presents and evaluates such a performance model. By exploiting the fact that the processing structure of data mining and scientific data analysis applications developed on FREERIDE-G involves generalized reductions, we are able to develop an accurate performance prediction model. We have evaluated our model using implementations of three wellknown data mining algorithms and two scientific data analysis applications developed using FREERIDE-G. Results from these five applications show that we are able to accurately predict execution times for applications as we vary the number of storage nodes, number of nodes available for computation, the dataset size, the network bandwidth, and the underlying hardware.
Keywords
Bandwidth; Computer networks; Data analysis; Data mining; Data processing; Engines; Hardware; Middleware; Predictive models; Resource management;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
Conference_Location
Long Beach, CA, USA
Print_ISBN
1-4244-0910-1
Electronic_ISBN
1-4244-0910-1
Type
conf
DOI
10.1109/IPDPS.2007.370275
Filename
4228003
Link To Document