Title :
There goes the neighborhood: Performance degradation due to nearby jobs
Author :
Bhatele, Abhinav ; Mohror, Kathryn ; Langer, Steven H. ; Isaacs, Katherine E.
Author_Institution :
Lawrence Livermore Nat. Lab., Livermore, CA, USA
Abstract :
Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28% faster to 41% slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.
Keywords :
Cray computers; parallel processing; performance evaluation; Blue Gene systems; Cray XE system; Cray machines; OS jitter; allocation requests; batch jobs; communication-heavy parallel application; compiler; computer center; energy cost saving; network links; performance degradation; performance variability; predictable performance; source code; system software changes; time estimation; Interference; Laser beams; Message passing; Resource management; Shape; Three-dimensional displays; Topology; communication performance; interference; resource management; system noise; torus networks;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
Conference_Location :
Denver, CO
Print_ISBN :
978-1-4503-2378-9
DOI :
10.1145/2503210.2503247