DocumentCode :
2136540
Title :
HUNTing the overlap
Author :
Iancu, Costin ; Husbands, Parry ; Hargrove, Paul
Author_Institution :
Computational Res. Div., Lawrence Berkeley Nat. Lab., CA, USA
fYear :
2005
fDate :
17-21 Sept. 2005
Firstpage :
279
Lastpage :
290
Abstract :
Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlapping communication with computation or other communication operations. Using non-blocking communication raises two issues: performance and programmability. In terms of performance, optimizers need to find a good communication schedule and are sometimes constrained by lack of full application knowledge. In terms of programmability, efficiently managing non-blocking communication can prove cumbersome for complex applications. In this paper we present the design principles of HUNT, a runtime system designed to search and exploit some of the available overlap present at execution time in UPC programs. Using virtual memory support, our runtime implements demand-driven synchronization for data involved in communication operations. It also employs message decomposition and scheduling heuristics to transparently improve the non-blocking behavior of applications. We provide a user level implementation of HUNT on a variety of modern high performance computing systems. Results indicate that our approach is successful in finding some of the overlap available at execution time. While system and application characteristics influence performance, perhaps the determining factor is the time taken by the CPU to execute a signal handler. Demand driven synchronization at execution time eliminates the need for the explicit management of non-blocking communication. Besides increasing programmer productivity, this feature also simplifies compiler analysis for communication optimizations.
Keywords :
message passing; optimisation; parallel programming; program compilers; scheduling; HUNT; UPC programs; communication latency; communication optimizations; compiler analysis; demand-driven synchronization; high performance computing; message decomposition; nonblocking communication; overlapping communication; parallel program optimization; scheduling heuristics; virtual memory support; Concurrent computing; Delay; Laboratories; Libraries; Manuals; Optimizing compilers; Processor scheduling; Program processors; Programming profession; Runtime;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on
ISSN :
1089-795X
Print_ISBN :
0-7695-2429-X
Type :
conf
DOI :
10.1109/PACT.2005.25
Filename :
1515600
Link To Document :
بازگشت