مرکز منطقه ای اطلاع رساني علوم و فناوري - Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor

DocumentCode :

625583

Title :

Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor

Author :

Heinecke, Alexander ; Vaidyanathan, Karthikeyan ; Smelyanskiy, Mikhail ; Kobotov, Alexander ; Dubtsov, Roman ; Henry, Greg ; Shet, Aniruddha G. ; Chrysos, Grigorios ; Dubey, Pradeep

Author_Institution :

Dept. of Inf., Tech. Univ. Munchen, Munich, Germany

fYear :

2013

fDate :

20-24 May 2013

Firstpage :

126

Lastpage :

137

Abstract :

Dense linear algebra has been traditionally used to evaluate the performance and efficiency of new architectures. This trend has continued for the past half decade with the advent of multi-core processors and hardware accelerators. In this paper we describe how several flavors of the Linpack benchmark are accelerated on Intel´s recently released Intel^® Xeon Phi™¹ co-processor (code-named Knights Corner) in both native and hybrid configurations. Our native DGEMM implementation takes full advantage of Knights Corner´s salient architectural features and successfully utilizes close to 90% of its peak compute capability. Our native Linpack implementation running entirely on Knights Corner employs novel dynamic scheduling and achieves close to 80% efficiency - the highest published co-processor efficiency. Similarly to native, our single-node hybrid implementation of Linpack also achieves nearly 80% efficiency. Using dynamic scheduling and an enhanced look-ahead scheme, this implementation scales well to a 100-node cluster, on which it achieves over 76% efficiency while delivering the total performance of 107 TFLOPS.

Keywords :

coprocessors; linear algebra; multiprocessing systems; processor scheduling; 100-node cluster; DGEMM implementation; Intel Xeon Phi coprocessor; Linpack benchmark; dense linear algebra; dynamic scheduling; enhanced look-ahead scheme; hardware accelerators; knights corner; multicore processors; multinode systems; salient architectural features; single node systems; Bandwidth; Kernel; Matrix decomposition; Prefetching; Registers; Tiles; Vectors; HPL; LU factorization; SIMD; TLP; Xeon Phi; hybrid parallelization; panel factorization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on

Conference_Location :

Boston, MA

ISSN :

1530-2075

Print_ISBN :

978-1-4673-6066-1

Type :

conf

DOI :

10.1109/IPDPS.2013.113

Filename :

6569806

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=625583