• DocumentCode
    2265485
  • Title

    Job centric cluster monitoring

  • Author

    Curry, Roger ; Simmonds, Rob

  • Author_Institution
    Dept. of Comput. Sci., Calgary Univ., Alta.
  • Volume
    1
  • fYear
    0
  • fDate
    0-0 0
  • Abstract
    This paper describes a system for monitoring jobs on large computational clusters. The aim is to extract information that is most useful for understanding the complete life-cycle of a job, combining and organising data from multiple sources. Information is taken from the batch scheduler and from collectors running on each node. These collect information about processes associated with the jobs as well as general operating system and device statistics. Heuristics are applied to extract information that could help a client tune job submission strategy, to provide better throughput on this cluster and to determine how effectively the provisioned resources are being utilised. Data is stored for post-mortem analysis and data-mining by other tools. Ways of utilising this service in a grid computing environment are discussed
  • Keywords
    grid computing; system monitoring; workstation clusters; data mining; grid computing; high performance computing; information extraction; job centric cluster monitoring; Computer science; Computerized monitoring; Data mining; Grid computing; High performance computing; Operating systems; Organizing; Portals; Processor scheduling; Statistics; Cluster Computing; Grid Monitoring.; High Performance Computing; Job Monitoring;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on
  • Conference_Location
    Minneapolis, MN
  • ISSN
    1521-9097
  • Print_ISBN
    0-7695-2612-8
  • Type

    conf

  • DOI
    10.1109/ICPADS.2006.54
  • Filename
    1655705