Title :
Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory
Author :
Cruz, Eduardo H M ; Diener, Matthias ; Navaux, Philippe O A
Author_Institution :
Inf. Inst., Fed. Univ. of Rio Grande do Sul, Porto Alegre, Brazil
Abstract :
The communication latency between the cores in multiprocessor architectures differs depending on the memory hierarchy and the interconnections. With the increase of the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables. For this reason, it is difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this paper, we introduce a new light-weight mechanism to detect the communication pattern of threads using the Translation Look aside Buffer (TLB). Our mechanism relies entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanism with the NAS Parallel Benchmarks (NPB) and achieved an accurate representation of the communication patterns. Using the detected communication patterns, we generated thread mappings using a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3%, reducing the number of cache misses by up to 31.1%.
Keywords :
parallel processing; shared memory systems; NAS parallel benchmarks; NPB; TLB; communication latency; communication pattern; cores per chip; light weight mechanism; map threads; memory access; memory hierarchy; multiprocessor architecture cores; parallel applications; shared memory; source code; translation look aside buffer; translation lookaside buffer; Complexity theory; Hardware; Instruction sets; Memory management; Message systems; Operating systems; Radiation detectors; Cache Misses; Interconnections; Parallel applications; Shared memory; TLB; Thread mapping; Translation Lookaside Buffer;
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0975-2
DOI :
10.1109/IPDPS.2012.56