DocumentCode :
3143367
Title :
Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms
Author :
Cruz, Eduardo Henrique Molina da ; Alves, Marco Antonio Zanata ; Carissimi, Alexandre ; Navaux, Philippe Olivier Alexandre ; Ribeiro, Christiane Pousa ; Méhaut, Jean-François
Author_Institution :
PPGC Grad. Program in Comput. Sci., UFRGS Fed. Univ. of Rio Grande do Sul, Porto Alegre, Brazil
fYear :
2011
fDate :
16-20 May 2011
Firstpage :
551
Lastpage :
558
Abstract :
In parallel programs, the tasks of a given application must cooperate in order to accomplish the required computation. However, the communication time between the tasks may be different depending on which core they are executing and how the memory hierarchy and interconnection are used. The problem is even more important in multi-core machines with NUMA characteristics, since the remote access imposes high overhead, making them more sensitive to thread and data mapping. In this context, process mapping is a technique that provides performance gains by improving the use of resources such as interconnections, main memory and cache memory. The problem of detecting the best mapping is considered NP-Hard. Furthermore, in shared memory environments, there is an additional difficulty of finding the communication pattern, which is implicit and occurs through memory accesses. This work aims to provide a method for static mapping for NUMA architectures which does not require any prior knowledge of the application. Different metrics were adopted and an heuristic method based on the Edmonds matching algorithm was used to obtain the mapping. In order to evaluate our proposal, we use the NAS Parallel Benchmarks (NPB) and two modern multi-core NUMA machines. Results show performance gains of up to 75% compared to the native scheduler and memory allocator of the operating system.
Keywords :
cache storage; computational complexity; memory architecture; multi-threading; shared memory systems; Edmonds matching algorithm; NAS parallel benchmark; NP-hard; NUMA architecture; cache memory; communication pattern; communication time; hierarchical multicore platform; memory access trace; memory hierarchy; multicore NUMA machine; multicore machine; parallel program; process mapping; remote access; shared memory environment; static mapping; thread mapping; Benchmark testing; Instruction sets; Linux; Measurement; Memory management; Multicore processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2011.197
Filename :
6008876
Link To Document :
بازگشت