مرکز منطقه ای اطلاع رساني علوم و فناوري - An improved parallel singular value algorithm and its implementation for multicore hardware

DocumentCode :

692928

Title :

An improved parallel singular value algorithm and its implementation for multicore hardware

Author :

Haidar, Azzam ; Kurzak, Jakub ; Luszczek, Piotr

Author_Institution :

Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA

fYear :

2013

fDate :

17-22 Nov. 2013

Firstpage :

Lastpage :

Abstract :

The enormous gap between the high-performance capabilities of today´s CPUs and off-chip communication poses extreme challenges to the development of numerical software that is scalable and achieves high performance. In this article, we describe a successful methodology to address these challenges-starting with our algorithm design, through kernel optimization and tuning, and finishing with our programming model. All these lead to development of a scalable high-performance Singular Value Decomposition (SVD) solver. We developed a set of highly optimized kernels and combined them with advanced optimization techniques that feature fine-grain and cache-contained kernels, a task based approach, and hybrid execution and scheduling runtime, all of which significantly increase the performance of our SVD solver. Our results demonstrate a many-fold performance increase compared to currently available software. In particular, our software is two times faster than Intel´s Math Kernel Library (MKL), a highly optimized implementation from the hardware vendor, when all the singular vectors are requested; it achieves a 5-fold speed-up when only 20% of the vectors are computed; and it is up to 10 times faster if only the singular values are required.

Keywords :

multiprocessing systems; operating system kernels; parallel algorithms; singular value decomposition; MKL; Math Kernel Library; SVD solver; advanced optimization techniques; cache-contained kernels; fine-grain kernels; hardware vendor; kernel optimization; kernel tuning; multicore hardware; parallel singular value algorithm; scalable high-performance singular value decomposition solver; singular vectors; Eigenvalues and eigenfunctions; Heuristic algorithms; Kernel; Layout; Processor scheduling; Symmetric matrices; Vectors; Performance; Reduction to bidiagonal; Singular Value Decomposition; eigenvalues and eigenvectors; task parallelism;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for

Conference_Location :

Denver, CO

Print_ISBN :

978-1-4503-2378-9

Type :

conf

DOI :

10.1145/2503210.2503292

Filename :

6877523

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=692928