DocumentCode :
3226745
Title :
Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform
Author :
Ying Zhang ; Saizheng Zhang
Author_Institution :
Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2013
fDate :
4-6 Nov. 2013
Firstpage :
71
Lastpage :
78
Abstract :
In this paper, we introduce an optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NVIDIA´s GPU). Carefully designed layer-wise strategies are conducted to integrate different kinds of deep architectures into a uniform neural training-testing system. Our fast matrix operation kernels are implemented in deep architecture´s propagation processes. In our experiment, these kernels save 70% time on average comparing with the kernels in NVIDIA´s CUBLAS library (widely used by many other neural network toolkits), and help our parallel deep architecture beats the neural structures using CUBLAS kernels in practical problems.
Keywords :
graphics processing units; learning (artificial intelligence); matrix algebra; neural nets; parallel architectures; parallel programming; NVIDIA CUBLAS library; NVIDIA GPU; deep architecture propagation process; fast matrix operation kernels; flexible layer structures; layer-wise strategies; neural network toolkits; neural structure; neural training-testing system; optimized deep learning architecture; parallel computing platform; parallel deep architecture; Computer architecture; Graphics processing units; Integrated circuits; Kernel; Libraries; Training; Vectors; GPU; deep architecture; deep learning; kernel; matrix operation; parallel computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on
Conference_Location :
Herndon, VA
ISSN :
1082-3409
Print_ISBN :
978-1-4799-2971-9
Type :
conf
DOI :
10.1109/ICTAI.2013.21
Filename :
6735232
Link To Document :
بازگشت