Title :
The Random Address Shift to Reduce the Memory Access Congestion on the Discrete Memory Machine
Author :
Nakano, Kaoru ; Matsumae, Susumu ; Ito, Yu
Author_Institution :
Dept. of Inf. Eng., Hiroshima Univ., Higashi-Hiroshima, Japan
Abstract :
The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access of the streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and w threads in a warp try to access them at the same time. However, memory access requests destined for the same memory bank are processed sequentially. Hence, it is very important for developing efficient algorithms to reduce the memory access congestion, the maximum number of memory access requests destined for the same bank. The memory access congestion takes value between 1 and w. The main contribution of this paper is to present a novel algorithmic technique called the random address shift that reduces the memory access congestion. We show that the memory access congestion is expected O(log w/log log w) for any memory access requests including malicious ones by a warp of w threads. The simulation results show that the expected congestion for w=32 threads is only 3.436. Since the malicious memory access requests destined for the same bank take congestion 32, our random address shift technique substantially reduces the memory access congestion. We have applied the random address shift technique to matrix transpose algorithms. The experimental results on GeForce GTX Titan show that the random address shift technique is practical and can accelerate the straightforward matrix transpose algorithms by a factor of 5.
Keywords :
DRAM chips; computational complexity; graphics processing units; matrix algebra; parallel algorithms; parallel architectures; shared memory systems; CUDA- enabled GPUs; DRAM chips; GeForce GTX Titan; algorithmic technique; discrete memory machine; matrix transpose algorithms; memory access congestion reduction; random address shift; shared memory; streaming multiprocessor; theoretical parallel computing model; w memory banks; w threads; Computational modeling; Graphics processing units; Instruction sets; Memory management; Phase change random access memory; Pipelines; Writing; CUDA; GPU; memory access congestion; memory bank conflicts; randomized technique;
Conference_Titel :
Computing and Networking (CANDAR), 2013 First International Symposium on
Conference_Location :
Matsuyama
Print_ISBN :
978-1-4799-2795-1
DOI :
10.1109/CANDAR.2013.21