DocumentCode
167527
Title
Analysis of MPI Shared-Memory Communication Performance from a Cache Coherence Perspective
Author
Putigny, Bertrand ; Ruelle, Benoit ; Goglin, Brice
Author_Institution
Inria Bordeaux - Sud-ouest, Bordeaux, France
fYear
2014
fDate
19-23 May 2014
Firstpage
1238
Lastpage
1247
Abstract
Shared memory MPI communication is an important part of the overall performance of parallel applications. However understanding the behavior of these data transfers is difficult because of the combined complexity of modern memory architectures with multiple levels of caches and complex cache coherence protocols, of MPI implementations, and of application needs. We analyze shared memory MPI communication from a cache coherence perspective through a new memory model. It captures the memory architecture characteristics with micro-benchmarks that exhibit the limitations of the memory accesses involved in the data transfer. We model the performance of intra-node communication without requiring complex analytical models. The advantage of the approach consists in not requiring deep knowledge of rarely documented hardware features such as caching policies or prefetchers that make modeling modern memory subsystems hardly feasible. Our qualitative analysis based on this result leads to a better understanding of shared memory communication performance for scientific computing. We then discuss some possible optimizations such as buffer reuse order, cache flushing, and non-temporal instructions that could be used by MPI implementers.
Keywords
cache storage; memory architecture; message passing; shared memory systems; MPI implementations; MPI shared-memory communication performance; buffer reuse order; cache flushing; caching policies; complex cache coherence protocols; data transfers; intra-node communication; memory accesses; memory architecture characteristics; memory model; memory subsystems; microbenchmarks; nontemporal instructions; optimizations; parallel applications; prefetchers; scientific computing; Benchmark testing; Coherence; Predictive models; Program processors; Protocols; Receivers; Throughput; MPI; cache coherence; memory model; shared-memory communication;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location
Phoenix, AZ
Print_ISBN
978-1-4799-4117-9
Type
conf
DOI
10.1109/IPDPSW.2014.139
Filename
6969521
Link To Document