Title :
CUDA Based Fast Implementation of Very Large Matrix Computation
Author :
Sun, Yinghong ; Tong, Yuanman
Author_Institution :
Dept. of Comput. Sci. & Technol., Hunan Int. Econ. Univ., Changsha, China
Abstract :
CUDA (Compute Unified Device Architecture) acceleration of very large scale matrix-vector and matrix-matrix multiplication is presented in this paper. The intrinsic parallelism in the matrix computations are exploited thoroughly. By dividing the entire matrix computation to multiple sub-groups, scalable performance improvement can be achieved using multiple GPUs. The key operations are accelerated by GPU. And the CUDA related data storage, threads hierarchy, and kernel implementation are proposed. Several optimization methods including coalesced global memory access, on-the-fly reduction, bank conflict free shared memory usage, loop unrolling, removing unnecessary synchronization, and concurrent execution on the device through streams are also employed. Experiment results show that about 8.5 times speedup can be achieved for CUDA accelerated matrix multiplication maximally.
Keywords :
concurrency control; coprocessors; mathematics computing; matrix multiplication; parallel processing; shared memory systems; CUDA; GPU; bank conflict free shared memory usage; coalesced global memory access; compute unified device architecture acceleration; concurrent execution; data storage; intrinsic parallelism; kernel implementation; loop unrolling; matrix-matrix multiplication; matrix-vector multiplication; on-the-fly reduction; optimization method; threads hierarchy; unnecessary synchronization removal; very large matrix computation; Acceleration; Graphics processing unit; Instruction sets; Kernel; Parallel processing; Performance evaluation; Sparse matrices; CUDA; GPU; matrix multiplication; matrix vector multiplication; parallel acceleration;
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-9110-0
Electronic_ISBN :
978-0-7695-4287-4
DOI :
10.1109/PDCAT.2010.45