مرکز منطقه ای اطلاع رساني علوم و فناوري - Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study

DocumentCode :

3588939

Title :

Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study

Author :

Kaixi Hou ; Hao Wang ; Wu-Chun Feng

Author_Institution :

Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA

fYear :

2014

Firstpage :

273

Lastpage :

282

Abstract :

Moore\´s Law effectively doubles the compute power of a microprocessor every 24 months. Over the past decade, however, this doubling in performance has been due to the doubling of the number of cores in a microprocessor rather than clock speed increases. Perhaps nowhere is this more evident than with the Intel Xeon Phi coprocessor. This many core architecture exhibits not only massive inter-core parallelism but also intra-core parallelism via a wider SIMD width. However, for data-intensive applications, the bandwidth constraint of MIChinders the full utilization of computational resources, especiallywhen massive parallelism is required to process big data sets. Furthermore, the process of optimizing the performance on suchplatforms is complex and requires architectural expertise. To evaluate the efficacy of the Intel MIC ecosystem for "big data" applications, we use the Floyd-Warshall algorithmas a representative case study for graph applications. Ourstudy offers evidence that traditional compiler optimizations candeliver parallel programmability to the masses on the Intel XeonPhi platform. That is, developers can straightforwardly createmanycore codes in the Intel Xeon Phi ecosystem that deliversignificant speedup. The optimizations include reordering data-access patterns, adjusting loop structures, vectorizing branches, and using OpenMP directives. We start from the default serialalgorithm and apply the above optimizations one by one. Overall, we achieve a 281.7-fold speedup over the default serial version. When compared with the default OpenMP Floyd-Warshall parallel implementation, we still achieve a 6.4-fold speedup. We also observe that the identically optimized code on MIC can outperform its CPU counterpart by up to 3.2-fold.

Keywords :

Big Data; microprocessor chips; parallel programming; program compilers; Floyd-Warshall algorithm; Intel MIC ecosystem; Intel Xeon Phi coprocessor; Intel Xeon Phi ecosystem; Intel Xeon Phi platform; MIChinders; Moore law; SIMD width; architectural expertise; bandwidth constraint; big data applications; big data sets; clock speed; compiler optimizations; computational resources; data access patterns; data-intensive applications; graph applications; inter-core parallelism; intra-core parallelism; many core architecture; massive parallelism; microprocessor; parallel programmability; Bandwidth; Coprocessors; Hardware; Instruction sets; Multicore processing; Optimization; Parallel processing; Floyd-Warshall; Intel Xeon Phi; MIC; graph; manycore; programmability;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on

ISSN :

1530-2016

Type :

conf

DOI :

10.1109/ICPPW.2014.44

Filename :

7103462

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3588939