DocumentCode
2320447
Title
GPU Performance Enhancement via Communication Cost Reduction: Case Studies of Radix Sort and WSN Relay Node Placement Problem
Author
Lee, Che-Rung ; Lo, Shih-Hsiang ; Chen, Nan-Hsi ; Chung, Yeh-Ching ; Chung, I-Hsin
Author_Institution
Dept. of Comput. Sci., Nat. TsingHua Univ. HsinChu, Hsinchu, Taiwan
fYear
2012
fDate
13-16 May 2012
Firstpage
132
Lastpage
139
Abstract
As the computational power of Graphics Processing Unit (GPU) increases, data transmission becomes the major performance bottleneck. In this study, we investigate two techniques, data streaming and data compression, to reduce the communication cost on GPU. Data streaming enables overlap of communication and computation, whereas data compression reduces the data size transferred among different memory spaces. Although both techniques increase computation cost, overall performance can still be enhanced by reducing communication cost. We demonstrate the effectiveness of the two techniques via two case studies: radix sort and 3-star, a deployment algorithm in wireless sensor networks. For radix sort, a new algorithm, which mixes MSD and LSD algorithms and employs data streaming, is presented. Its performance is 25% faster than the fastest GPU radix sort implementation currently available in the public domain. For the 3-star algorithm, the speed increases several hundreds of times faster than that obtained by the CPU code. The data streaming and data compression, which is a hybrid CPU-GPU algorithm, provide an additional 54% performance improvement to the GPU implementation. Data compression not only reduces communication cost, but also improves the computation time, by which further performance enhancement can be achieved.
Keywords
cost reduction; data communication; data compression; graphics processing units; performance evaluation; wireless sensor networks; 3-star algorithm; CPU code; CPU-GPU algorithm; GPU performance enhancement; LSD algorithms; MSD algorithms; WSN relay node placement problem; communication cost reduction; computational power; data compression; data streaming; data transmission; graphics processing unit; memory spaces; radix sort; wireless sensor networks; Approximation algorithms; Graphics processing unit; Instruction sets; Kernel; Partitioning algorithms; Relays; Wireless sensor networks; GPU; data compression; data streaming; radix sort; wireless sensor networks;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
Conference_Location
Ottawa, ON
Print_ISBN
978-1-4673-1395-7
Type
conf
DOI
10.1109/CCGrid.2012.16
Filename
6217414
Link To Document