DocumentCode :
652520
Title :
The Exploration of Pervasive and Fine-Grained Parallel Model Applied on Intel Xeon Phi Coprocessor
Author :
Calvin, C. ; Fan Ye ; Petiton, S.
Author_Institution :
DM2S Commissariat a l´Energie Atomique, CEA, France
fYear :
2013
fDate :
28-30 Oct. 2013
Firstpage :
166
Lastpage :
173
Abstract :
In this paper we investigate the dissimilar multithreading programming paradigms on x86 CPU architectures, where the recently released Intel Xeon Phi Coprocessor and commonly used Intel Xeon processors were studied, as well as the NVIDIA K20 GPU, which represents the cutting-edge general purpose graphics processing unit. The relevant numerical algorithm selected to address the problem is power method, which is widely used to compute the dominant eigenvalue of a matrix. This work focuses on dense linear algebra. The frequently used multi-core or many-core processor parallelism techniques include OpenMP, Intel Cilk Plus, Intel Threading Building Blocks, i.e. TBB, along with the optimized computing libraries such as Intel Math Kernel Library(MKL) or the NVIDIA CUDA Basic Linear Algebra Subroutines(cuBLAS) library. Optimized implementations of these techniques were separately applied to the aforementioned architectures. For the reason that a unitary programming model may not satisfy the growing performance demand, we also explored some possible mix of these languages. The study shows that the hybrid pattern of multithreading and data parallelism via explicit vectorization maximizes the performance on x86 architectures, which allows us to obtain 80% of the sustainable peak performance in double precision on the Intel Many Integrated Core(MIC) Architecture. In the case of single precision, this number reaches even 96%. In addition, this approach enables a reasonable performance by requiring least developing time. The numbers of iterations till convergence are roughly the same in both architectures of CPU and GPU. The GPU performs better in small matrix sizes. However, the Intel Xeon Phi coprocessor excels for large sizes with a better scalability.
Keywords :
graphics processing units; mathematics computing; matrix algebra; multi-threading; multiprocessing systems; ubiquitous computing; Building Blocks; Intel Cilk Plus; Intel Math Kernel Library; Intel Threading; Intel Xeon Phi coprocessor; NVIDIA CUDA basic linear algebra subroutines library; NVIDIA K20 GPU; OpenMP; data parallelism; dense linear algebra; dissimilar multithreading programming paradigms; fine-grained parallel model; general purpose graphics processing unit; iteration method; many-core processor parallelism techniques; multicore processor parallelism techniques; multithreading; optimized computing libraries; pervasive model; Arrays; Computational modeling; Coprocessors; Graphics processing units; Kernel; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2013 Eighth International Conference on
Conference_Location :
Compiegne
Type :
conf
DOI :
10.1109/3PGCIC.2013.31
Filename :
6681224
Link To Document :
بازگشت