Title :
PSO Efficient Implementation on GPUs Using Low Latency Memory
Author :
Silva, Eric H. M. ; Bastos Filho, Carmelo J. A.
Author_Institution :
Univ. de Pernambuco (UPE), Recife, Brazil
Abstract :
This paper proposes an efficient implementation for the Particle Swarm Optimization (PSO) algorithm using the shared memory available in the Graphic Processing Units (GPU) of CUDA (Compute Unified Device Architecture) platforms. In our proposal each dimension of each particle is mapped as a thread. The threads are executed in parallel within a GPU block. Since the GPU blocks present a maximum number of allowed parallel threads, we propose to use multiple sub-swarms. Each sub-swarm is executed in a GPU block aiming at maximizing data alignments and avoiding instructions bifurcations. We also propose two communication mechanisms and two topologies in order to allow the sub-swarm to exchange information and collaborate by using the GPU global memory. The results for 8 sub-swarms, each one with 32 particles and 32 dimensions, show speedups up to 100 and 5 times when compared to the serial implementation and PSO start-of-art implementation for CUDA, respectively. Our proposal allows one to deploy PSO algorithms in continuous optimization problems, which present many input variables. This type of problem is very common in engineering.
Keywords :
graphics processing units; parallel architectures; particle swarm optimisation; CUDA platforms; Compute Unified Device Architecture platforms; GPU blocks; GPU global memory; PSO algorithm; continuous optimization problems; graphic processing units; low latency memory; particle swarm optimization; shared memory; Central Processing Unit; Computer architecture; Graphics processing units; Instruction sets; Kernel; Particle swarm optimization; Surges; CUDA; Graphics Processing Units; Parallel Computing; Particle Swarm Optimization; Shared Memory; Swarm intelligence;
Journal_Title :
Latin America Transactions, IEEE (Revista IEEE America Latina)
DOI :
10.1109/TLA.2015.7112023