DocumentCode :
3678434
Title :
Practical Resource Monitoring for Robust High Throughput Computing
Author :
Gideon Juve;Benjamin Tovar;Rafael Ferreira da Silva; Król;Douglas Thain;Ewa Deelman;William Allcock;Miron Livny
Author_Institution :
Inf. Sci. Inst., Univ. of Southern California, Marina Del Rey, CA, USA
fYear :
2015
Firstpage :
650
Lastpage :
657
Abstract :
Robust high throughput computing requires effective monitoring and enforcement of a variety of resources including CPU cores, memory, disk, and network traffic. Without effective monitoring and enforcement, it is easy to overload machines, causing failures and slowdowns, or underutilize machines, which results in wasted opportunities. This paper explores how to describe, measure, and enforce resources used by computational tasks. We focus on tasks running in distributed execution systems, in which a task requests the resources it needs, and the execution system ensures the availability of such resources. This presents two non-trivial problems: how to measure the resources consumed by a task, and how to monitor and report resource exhaustion in a robust and timely manner. For both of these tasks, operating systems have a variety of mechanisms with different degrees of availability, accuracy, overhead, and intrusiveness. We describe various forms of monitoring and the available mechanisms in contemporary operating systems. We then present two specific monitoring tools that choose different tradeoffs in overhead and accuracy, and evaluate them on a selection of benchmarks.
Keywords :
"Monitoring","Libraries","Linux","Kernel","Probes","Radiation detectors"
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/CLUSTER.2015.115
Filename :
7307664
Link To Document :
بازگشت