DocumentCode
1783294
Title
Victim Selection and Distributed Work Stealing Performance: A Case Study
Author
Perarnau, Swann ; Sato, Mitsuhisa
Author_Institution
RIKEN, AICS, Kobe, Japan
fYear
2014
fDate
19-23 May 2014
Firstpage
659
Lastpage
668
Abstract
Work stealing is a popular solution to perform dynamic load balancing of irregular computations, both for shared memory and distributed memory systems. While shared memory performance of work stealing is well understood, distributing this algorithm to several thousands of nodes can introduce new performance issues. In particular, most studies of work stealing assume that all participating processes are equidistant from each other, in terms of communication latency. This paper presents a new performance evaluation of the popular UTS benchmark, in its work stealing implementation, on the scale of ten thousands of compute nodes. Taking advantage of the physical scale of the K Computer, we investigate in details the performance impact of communication latencies on work stealing. In particular, we introduce a new performance metric to assess the time needed by the work stealing scheduler to distribute work among all processes. Using this metric, we identify a previously overlooked issue: the victim selection function used by the work stealing application can severely impact its performance at large scale. To solve this issue, we introduce a new strategy taking into account the physical distance between nodes and achieve significant performance improvements.
Keywords
distributed memory systems; performance evaluation; resource allocation; shared memory systems; K computer; UTS benchmark; communication latency; distributed memory systems; distributed work stealing performance; dynamic load balancing; performance evaluation; performance metric; shared memory systems; victim selection function; work stealing scheduler; Benchmark testing; Blades; Computers; Load management; Measurement; Memory management; Resource management; distributed load balancing; latency; work stealing;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.74
Filename
6877298
Link To Document