DocumentCode
174523
Title
Analysis of job execution reliability in a grid through job accounting tool
Author
Sunny, Jibin ; Divya, M.G. ; Chattopadhyay, Subrata
Author_Institution
Center for Dev. of Adv. Comput., Bangalore, India
fYear
2014
fDate
26-28 Aug. 2014
Firstpage
57
Lastpage
61
Abstract
Computational grid and high performance computing resources are widely used by researchers and academicians for research activities. Stable and reliable computational resource is essential. In order to achieve this apart from monitoring network health, service health etc knowing job statistics also equally important, hence Job Accounting Tool(JAT) is designed for this functionality. JAT is a Web based tool which is designed to capture job statistics from grid metascheduler-gridway and local resource manager-portable batch system. Job statistics includes done, failed and killed jobs along with other parameters like wall time, exit status, user name etc. The major objective of this tool is to identify and analyze reasons for job failure. It is deployed and functioning at national grid computing initiative GARUDA. Various job statistics and job failure analysis are demonstrated in this paper.
Keywords
accounting; graphical user interfaces; grid computing; GARUDA; JAT; Web based tool design; computational grid; computational resource; done jobs; exit status; failed jobs; grid metascheduler; gridway; high-performance computing resources; job accounting tool; job execution reliability analysis; job failure reason analysis; job failure reason identification; job statistics; killed jobs; local resource manager-portable batch system; national grid computing initiative; research activities; user name; wall time; Databases; Failure analysis; Grid computing; Magnetic heads; Monitoring; Object recognition; Reliability; Grid; Gridway; Job accounting; Reliability; Torque;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Science & Engineering (ICDSE), 2014 International Conference on
Conference_Location
Kochi
Print_ISBN
978-1-4799-6870-1
Type
conf
DOI
10.1109/ICDSE.2014.6974612
Filename
6974612
Link To Document