DocumentCode :
3588939
Title :
Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study
Author :
Kaixi Hou ; Hao Wang ; Wu-Chun Feng
Author_Institution :
Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA
fYear :
2014
Firstpage :
273
Lastpage :
282
Abstract :
Moore\´s Law effectively doubles the compute power of a microprocessor every 24 months. Over the past decade, however, this doubling in performance has been due to the doubling of the number of cores in a microprocessor rather than clock speed increases. Perhaps nowhere is this more evident than with the Intel Xeon Phi coprocessor. This many core architecture exhibits not only massive inter-core parallelism but also intra-core parallelism via a wider SIMD width. However, for data-intensive applications, the bandwidth constraint of MIChinders the full utilization of computational resources, especiallywhen massive parallelism is required to process big data sets. Furthermore, the process of optimizing the performance on suchplatforms is complex and requires architectural expertise. To evaluate the efficacy of the Intel MIC ecosystem for "big data" applications, we use the Floyd-Warshall algorithmas a representative case study for graph applications. Ourstudy offers evidence that traditional compiler optimizations candeliver parallel programmability to the masses on the Intel XeonPhi platform. That is, developers can straightforwardly createmanycore codes in the Intel Xeon Phi ecosystem that deliversignificant speedup. The optimizations include reordering data-access patterns, adjusting loop structures, vectorizing branches, and using OpenMP directives. We start from the default serialalgorithm and apply the above optimizations one by one. Overall, we achieve a 281.7-fold speedup over the default serial version. When compared with the default OpenMP Floyd-Warshall parallel implementation, we still achieve a 6.4-fold speedup. We also observe that the identically optimized code on MIC can outperform its CPU counterpart by up to 3.2-fold.
Keywords :
Big Data; microprocessor chips; parallel programming; program compilers; Floyd-Warshall algorithm; Intel MIC ecosystem; Intel Xeon Phi coprocessor; Intel Xeon Phi ecosystem; Intel Xeon Phi platform; MIChinders; Moore law; SIMD width; architectural expertise; bandwidth constraint; big data applications; big data sets; clock speed; compiler optimizations; computational resources; data access patterns; data-intensive applications; graph applications; inter-core parallelism; intra-core parallelism; many core architecture; massive parallelism; microprocessor; parallel programmability; Bandwidth; Coprocessors; Hardware; Instruction sets; Multicore processing; Optimization; Parallel processing; Floyd-Warshall; Intel Xeon Phi; MIC; graph; manycore; programmability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on
ISSN :
1530-2016
Type :
conf
DOI :
10.1109/ICPPW.2014.44
Filename :
7103462
Link To Document :
بازگشت