DocumentCode :
2939971
Title :
Monitoring Workflow Applications in Large Scale Distributed Systems
Author :
Sbirlea, Dragos ; Simion, Alina ; Pop, Florin ; Cristea, Valentin
Author_Institution :
Fac. of Automatics & Comput. Sci., Univ. Politeh. of Bucharest, Bucharest, Romania
fYear :
2009
fDate :
4-6 Nov. 2009
Firstpage :
162
Lastpage :
169
Abstract :
This paper presents the design, implementation and testing of the monitoring solution created for integration with a workflow execution platform. The monitoring solution constantly checks the system evolution in order to facilitate performance tuning and improvement. Monitoring is accomplished at application level, by monitoring each job from each workflow and at system level, by aggregating state information from each processing node. The solution also computes aggregated statistics that allow an improvement to the scheduling component of the system, with which it will interact. The improvement on the performance of distributed application is obtained using the realtime information to compute estimates of runtime which are used to improve scheduling. Another contribution is an automated error detection systems, which can improve the robustness of grid by enabling fault recovery mechanisms to be used. These aspects can benefit from the particularization of the monitoring system for a workflow-based application: the scheduling performance can be improved through better runtime estimation and the error detection can automatically detect several types of errors. The proposed monitoring solution could be used in the SEEGRID project as a part of the satellite image processing engine that is being built.
Keywords :
grid computing; image processing; workflow management software; SEEGRID project; aggregated statistics; automated error detection system; grid environment; large scale distributed systems; runtime estimation; satellite image processing engine; workflow execution platform; workflow monitoring; Computerized monitoring; Distributed computing; Estimation error; Fault detection; Large-scale systems; Processor scheduling; Robustness; Runtime; Statistical distributions; Testing; Grid Environment; Monitoring; Runtime Estimation; Satellite Image Processing; Workflow Applications;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Networking and Collaborative Systems, 2009. INCOS '09. International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-5165-4
Electronic_ISBN :
978-0-7695-3858-7
Type :
conf
DOI :
10.1109/INCOS.2009.73
Filename :
5370938
Link To Document :
بازگشت