DocumentCode
166660
Title
High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design
Author
Jose, Jithin ; Hamidouche, Khaled ; Xiaoyi Lu ; Potluri, Sreeram ; Jie Zhang ; Tomko, Karen ; Panda, Dhabaleswar K.
Author_Institution
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear
2014
fDate
22-26 Sept. 2014
Firstpage
10
Lastpage
18
Abstract
Intel Many Integrated Core (MIC) architectures are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. Partitioned Global Address Space (PGAS) programming models, such as OpenSHMEM, provide an attractive approach for developing scientific applications with irregular communication characteristics, by abstracting shared memory address space, along with one-sided communication semantics. However, the current OpenSHMEM standard does not efficiently support heterogeneous memory architectures such as Xeon Phi. Host and Xeon Phi cores have different memory capacities and compute characteristics. But, the global symmetric memory allocation in the current OpenSHMEM standard mandates that same amount of memory be allocated on every process. In this paper, we propose extensions to overcome this restriction and propose high performance runtime-level designs for efficient communication involving Xeon Phi processors. Further, we re-design applications to demonstrate the effectiveness of the proposed designs and extensions. Experimental evaluations indicate 4X to 7X reduction in OpenSHMEM data movement operation latencies, and 6X to 11X improvement in performance for collective operations. Application evaluations in symmetric mode indicate performance improvements of 28% at 1,024 processes. Further, application redesigns using the proposed extensions provide several magnitudes of performance improvement, as compared to the symmetric mode. To the best of our knowledge, this is the first research work that proposes high performance runtime designs for OpenSHMEM on Intel Xeon Phi clusters.
Keywords
memory architecture; microprocessor chips; multiprocessing systems; parallel processing; shared memory systems; Host cores; Intel Xeon Phi clusters; Intel many integrated core; MIC architectures; OpenSHMEM standard; PGAS programming models; Xeon Phi cores; Xeon Phi processors; application codesign; compute characteristics; data movement operation latencies; global symmetric memory allocation; heterogeneous memory architectures; high compute density; high performance OpenSHMEM; high performance runtime-level designs; irregular communication characteristics; memory capacities; modern supercomputer architectures; one-sided communication semantics; partitioned global address space; runtime designs; scientific applications; shared memory address space; Bandwidth; Coprocessors; Electronics packaging; Memory management; Resource management; Runtime;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing (CLUSTER), 2014 IEEE International Conference on
Conference_Location
Madrid
Type
conf
DOI
10.1109/CLUSTER.2014.6968754
Filename
6968754
Link To Document