Title :
Virtual Topologies for Scalable Resource Management and Contention Attenuation in a Global Address Space Model on the Cray XT5
Author :
Yu, Weikuan ; Tipparaju, Vinod ; Que, Xinyu ; Vetter, Jeffrey S.
Abstract :
Global Address Space (GAS) programming models enable a convenient, shared-memory style addressing model, and support completely asynchronous data movement. Their underlying runtime systems face critical challenges in (1) scalably managing resources (such as memory for communication buffers), and (2) gracefully handling unpredictable communication patterns and any associated contention. In this research, we investigate these challenges for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI) on, large-scale Cray XT5 systems. We represent the management of communication resources as directed graphs, and propose two new scalable virtual topologies, Meshed Fully Connected Graphs (MFCG) and Cubic Fully Connected Graphs (CFCG), for scalable resource management and contention attenuation. To ensure deadlock-free communication in these multi-dimensional topologies, we design and develop Lowest Dimension First (LDF) forwarding to support fully- or partially-populated MFCG and CFCG on any number of nodes. We have extensively evaluated the benefits of these virtual topologies on the petascale Jaguar Cray XT5 system at Oak Ridge National Laboratory. Our experimental results demonstrate MFCG as the most suitable virtual topology because of its benefits in resource management, contention mitigation, and the resulting benefit to scientific applications.
Keywords :
buffer storage; directed graphs; resource allocation; shared memory systems; storage allocation; ARMCI; GAS runtime library; Oak Ridge National Laboratory; aggregate remote memory copy interface; communication buffer memory; communication resource management; completely asynchronous data movement; contention attenuation; contention mitigation; cubic fully connected graphs; deadlock-free communication; directed graph; global address space programming model; lowest dimension first forwarding; meshed fully connected graphs; multidimensional topology; petascale Jaguar Cray XT5 system; runtime system; scalable resource management; shared-memory style addressing model; unpredictable communication pattern handling; virtual topology; Hypercubes; Resource management; Runtime; Servers; System recovery; Topology; Vegetation; ARMCI; Contention; GAS; Virtual Topology;
Conference_Titel :
Parallel Processing (ICPP), 2011 International Conference on
Conference_Location :
Taipei City
Print_ISBN :
978-1-4577-1336-1
Electronic_ISBN :
0190-3918
DOI :
10.1109/ICPP.2011.38