Title :
Monitoring Workflow Applications in Large Scale Distributed Systems
Author :
Sbirlea, Dragos ; Simion, Alina ; Pop, Florin ; Cristea, Valentin
Author_Institution :
Fac. of Automatics & Comput. Sci., Univ. Politeh. of Bucharest, Bucharest, Romania
Abstract :
This paper presents the design, implementation and testing of the monitoring solution created for integration with a workflow execution platform. The monitoring solution constantly checks the system evolution in order to facilitate performance tuning and improvement. Monitoring is accomplished at application level, by monitoring each job from each workflow and at system level, by aggregating state information from each processing node. The solution also computes aggregated statistics that allow an improvement to the scheduling component of the system, with which it will interact. The improvement on the performance of distributed application is obtained using the realtime information to compute estimates of runtime which are used to improve scheduling. Another contribution is an automated error detection systems, which can improve the robustness of grid by enabling fault recovery mechanisms to be used. These aspects can benefit from the particularization of the monitoring system for a workflow-based application: the scheduling performance can be improved through better runtime estimation and the error detection can automatically detect several types of errors. The proposed monitoring solution could be used in the SEEGRID project as a part of the satellite image processing engine that is being built.
Keywords :
grid computing; image processing; workflow management software; SEEGRID project; aggregated statistics; automated error detection system; grid environment; large scale distributed systems; runtime estimation; satellite image processing engine; workflow execution platform; workflow monitoring; Computerized monitoring; Distributed computing; Estimation error; Fault detection; Large-scale systems; Processor scheduling; Robustness; Runtime; Statistical distributions; Testing; Grid Environment; Monitoring; Runtime Estimation; Satellite Image Processing; Workflow Applications;
Conference_Titel :
Intelligent Networking and Collaborative Systems, 2009. INCOS '09. International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-5165-4
Electronic_ISBN :
978-0-7695-3858-7
DOI :
10.1109/INCOS.2009.73