Title :
Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms
Author :
Nere, Andrew ; Hashmi, Atif ; Lipasti, Mikko
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Wisconsin-Madison, Madison, WI, USA
Abstract :
Recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility. Such attributes have once again sparked an interest in creating learning algorithms that aspire to reverse-engineer many of the abilities of the brain. In this paper we describe a GPGPU-accelerated extension to an intelligent learning model inspired by the structural and functional properties of the mammalian neocortex. Our cortical network, like the brain, exhibits massive amounts of processing parallelism, making today´s GPGPUs a highly attractive and readily-available hardware accelerator for such a model. Furthermore, we consider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose optimizations such as a software work-queue structure and pipelining the hierarchical layers of the cortical network to mitigate such problems. Our analysis provides important insight into the GPU architecture details including the number of cores, the memory system, and the global thread scheduler. Additionally, we create a runtime profiling tool for our parallel learning algorithm which proportionally distributes the cortical network across the host CPU as well as multiple GPUs, whether homogeneous or heterogeneous, that may be available to the system. Using the profiling tool with these optimizations on Nvidia´s CUDA framework, we achieve up to 60× speedup over a single-threaded CPU implementation of the model.
Keywords :
computer graphic equipment; coprocessors; parallel processing; CUDA framework; GPGPU-accelerated extension; GPU architecture; cortically inspired learning algorithm; global thread scheduler; hardware accelerator; heterogeneous multiGPU system; human neocortex; intelligent learning model; mammalian neocortex; memory system; multiple kernel-launch overhead; parallel computing device; parallel learning algorithm; runtime profiling tool; software work-queue structure; Biological system modeling; Brain models; Computational modeling; Graphics processing unit; Neurons;
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-61284-372-8
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2011.88