DocumentCode :
625583
Title :
Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor
Author :
Heinecke, Alexander ; Vaidyanathan, Karthikeyan ; Smelyanskiy, Mikhail ; Kobotov, Alexander ; Dubtsov, Roman ; Henry, Greg ; Shet, Aniruddha G. ; Chrysos, Grigorios ; Dubey, Pradeep
Author_Institution :
Dept. of Inf., Tech. Univ. Munchen, Munich, Germany
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
126
Lastpage :
137
Abstract :
Dense linear algebra has been traditionally used to evaluate the performance and efficiency of new architectures. This trend has continued for the past half decade with the advent of multi-core processors and hardware accelerators. In this paper we describe how several flavors of the Linpack benchmark are accelerated on Intel´s recently released Intel® Xeon Phi™1 co-processor (code-named Knights Corner) in both native and hybrid configurations. Our native DGEMM implementation takes full advantage of Knights Corner´s salient architectural features and successfully utilizes close to 90% of its peak compute capability. Our native Linpack implementation running entirely on Knights Corner employs novel dynamic scheduling and achieves close to 80% efficiency - the highest published co-processor efficiency. Similarly to native, our single-node hybrid implementation of Linpack also achieves nearly 80% efficiency. Using dynamic scheduling and an enhanced look-ahead scheme, this implementation scales well to a 100-node cluster, on which it achieves over 76% efficiency while delivering the total performance of 107 TFLOPS.
Keywords :
coprocessors; linear algebra; multiprocessing systems; processor scheduling; 100-node cluster; DGEMM implementation; Intel Xeon Phi coprocessor; Linpack benchmark; dense linear algebra; dynamic scheduling; enhanced look-ahead scheme; hardware accelerators; knights corner; multicore processors; multinode systems; salient architectural features; single node systems; Bandwidth; Kernel; Matrix decomposition; Prefetching; Registers; Tiles; Vectors; HPL; LU factorization; SIMD; TLP; Xeon Phi; hybrid parallelization; panel factorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
ISSN :
1530-2075
Print_ISBN :
978-1-4673-6066-1
Type :
conf
DOI :
10.1109/IPDPS.2013.113
Filename :
6569806
Link To Document :
بازگشت