DocumentCode :
692928
Title :
An improved parallel singular value algorithm and its implementation for multicore hardware
Author :
Haidar, Azzam ; Kurzak, Jakub ; Luszczek, Piotr
Author_Institution :
Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
fYear :
2013
fDate :
17-22 Nov. 2013
Firstpage :
1
Lastpage :
12
Abstract :
The enormous gap between the high-performance capabilities of today´s CPUs and off-chip communication poses extreme challenges to the development of numerical software that is scalable and achieves high performance. In this article, we describe a successful methodology to address these challenges-starting with our algorithm design, through kernel optimization and tuning, and finishing with our programming model. All these lead to development of a scalable high-performance Singular Value Decomposition (SVD) solver. We developed a set of highly optimized kernels and combined them with advanced optimization techniques that feature fine-grain and cache-contained kernels, a task based approach, and hybrid execution and scheduling runtime, all of which significantly increase the performance of our SVD solver. Our results demonstrate a many-fold performance increase compared to currently available software. In particular, our software is two times faster than Intel´s Math Kernel Library (MKL), a highly optimized implementation from the hardware vendor, when all the singular vectors are requested; it achieves a 5-fold speed-up when only 20% of the vectors are computed; and it is up to 10 times faster if only the singular values are required.
Keywords :
multiprocessing systems; operating system kernels; parallel algorithms; singular value decomposition; MKL; Math Kernel Library; SVD solver; advanced optimization techniques; cache-contained kernels; fine-grain kernels; hardware vendor; kernel optimization; kernel tuning; multicore hardware; parallel singular value algorithm; scalable high-performance singular value decomposition solver; singular vectors; Eigenvalues and eigenfunctions; Heuristic algorithms; Kernel; Layout; Processor scheduling; Symmetric matrices; Vectors; Performance; Reduction to bidiagonal; Singular Value Decomposition; eigenvalues and eigenvectors; task parallelism;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
Conference_Location :
Denver, CO
Print_ISBN :
978-1-4503-2378-9
Type :
conf
DOI :
10.1145/2503210.2503292
Filename :
6877523
Link To Document :
بازگشت