Title :
Analysis of job execution reliability in a grid through job accounting tool
Author :
Sunny, Jibin ; Divya, M.G. ; Chattopadhyay, Subrata
Author_Institution :
Center for Dev. of Adv. Comput., Bangalore, India
Abstract :
Computational grid and high performance computing resources are widely used by researchers and academicians for research activities. Stable and reliable computational resource is essential. In order to achieve this apart from monitoring network health, service health etc knowing job statistics also equally important, hence Job Accounting Tool(JAT) is designed for this functionality. JAT is a Web based tool which is designed to capture job statistics from grid metascheduler-gridway and local resource manager-portable batch system. Job statistics includes done, failed and killed jobs along with other parameters like wall time, exit status, user name etc. The major objective of this tool is to identify and analyze reasons for job failure. It is deployed and functioning at national grid computing initiative GARUDA. Various job statistics and job failure analysis are demonstrated in this paper.
Keywords :
accounting; graphical user interfaces; grid computing; GARUDA; JAT; Web based tool design; computational grid; computational resource; done jobs; exit status; failed jobs; grid metascheduler; gridway; high-performance computing resources; job accounting tool; job execution reliability analysis; job failure reason analysis; job failure reason identification; job statistics; killed jobs; local resource manager-portable batch system; national grid computing initiative; research activities; user name; wall time; Databases; Failure analysis; Grid computing; Magnetic heads; Monitoring; Object recognition; Reliability; Grid; Gridway; Job accounting; Reliability; Torque;
Conference_Titel :
Data Science & Engineering (ICDSE), 2014 International Conference on
Conference_Location :
Kochi
Print_ISBN :
978-1-4799-6870-1
DOI :
10.1109/ICDSE.2014.6974612