DocumentCode :
3090113
Title :
A Comparative Evaluation of Parallel Programming Models for Shared-Memory Architectures
Author :
Sanchez, Luis Miguel ; Fernandez, Javier ; Sotomayor, Rafael ; Garcia, J. Daniel
Author_Institution :
Comput. Sci. Dept., Univ. Carlos III de Madrid, Leganés, Spain
fYear :
2012
fDate :
10-13 July 2012
Firstpage :
363
Lastpage :
370
Abstract :
Nowadays, most computers that are commercially available off-the-shelf (COTS) include hardware features that increase the performance of parallel general-purpose threads (hyper threading, multicore, ccNUMA architectures) or SIMD kernels (CPU vector instructions, GPUs). The purpose of this paper is to perform a compared evaluation of several parallel programming models where each one is fitted to exploit some of these features but also each one requires a different level of programming skills. Four parallel programming models (OpenMP, Intel TBB, Intel ArBB, and CUDA) have been selected. The idea is to cover a wide spectrum of programming models and most of the parallel hardware features included in modern computers. On one hand, OpenMP and TBB platforms, that exploits parallel threads running on multicore systems. On the other hand, ArBB, that combines multicore parallel threads and multicore SIMD features with a simpler programming model, and CUDA that exploits SIMD features of the GPU hardware. Our results obtained with the benchmarks used on this paper suggest that OpenMP and TBB have a lower performance compared to ArBB and CUDA. But also that ArBB performance tends to be comparable with CUDA performance in most cases (although it is normally lower). Thus, there are evidences that a careful designed top range multicore and multisocket architecture, can be comparable in terms of performance with top range GPU cards for many applications, with the advantage of a simpler programming model.
Keywords :
memory architecture; multi-threading; parallel architectures; performance evaluation; shared memory systems; COTS; CPU vector instructions; CUDA; GPU cards; Intel ArBB; Intel TBB; OpenMP; SIMD kernels; ccNUMA architectures; commercially available off-the-shelf; hyper threading; multicore SIMD features; multicore parallel threads; multisocket architecture; parallel general-purpose threads; parallel programming models; programming skills; shared-memory architectures; Benchmark testing; Computer architecture; Computers; Graphics processing unit; Instruction sets; Parallel processing; Programming; GPGPU; Multicore; Paralel computing; SIMD;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on
Conference_Location :
Leganes
Print_ISBN :
978-1-4673-1631-6
Type :
conf
DOI :
10.1109/ISPA.2012.54
Filename :
6280314
Link To Document :
بازگشت