Title :
Economical Two-fold Working Precision Matrix Multiplication on Consumer-Level CUDA GPUs
Author :
Fujimoto, Noriyuki
Author_Institution :
Dept. of Math. & Inf. Sci., Osaka Prefecture Univ., Sakai, Japan
Abstract :
Dot product faithfully rounded after "as if" computed in K-fold working precision (K ≤ 2) is known to be computable only with floating-point numbers defined in IEEE 754 floating-point standard. This paper presents a CUDA GPU implementation of two-fold working precision matrix multiplication based on the dot product computation method. Experimental results on a GeForce GTX580 and a GTX560Ti show that the proposed implementation has 1.84 to 1.95 times higher GFLOPS performance in two- fold working precision compared to the performance of CUBLAS dgemm in double-precision on a Tesla C2070 high-end GPU. The proposed implementation can be used to obtain higher performance in pseudo double-precision with low cost consumer-level GPUs whose double-precision native performance is limited.
Keywords :
computer graphic equipment; coprocessors; matrix multiplication; parallel architectures; GFLOPS performance; K-fold working precision; Tesla C2070 GPU; compute unified device architecture; consumer-level CUDA GPU; dot product computation method; graphics processing unit; precision matrix multiplication; Clocks; Documentation; Graphics processing unit; Instruction sets; Memory management; Programming; CUDA; GPGPU; error free transformation; matrix multiplication;
Conference_Titel :
Architecture and Multi-Core Applications (WAMCA), 2011 Second Workshop on
Conference_Location :
Vitoria, Espirito Santo
Print_ISBN :
978-1-4673-0221-0
DOI :
10.1109/WAMCA.2011.18