A parallel computing platform for training large scale neural networks

Author

Rong Gu ; Furao Shen ; Yihua Huang

Author_Institution

Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China

fYear

2013

fDate

6-9 Oct. 2013

Firstpage

376

Lastpage

384

Abstract

Artificial neural networks (ANNs) have been proved to be successfully used in a variety of pattern recognition and data mining applications. However, training ANNs on large scale datasets are both data-intensive and computation-intensive. Therefore, large scale ANNs are used with reservation for their time-consuming training to get high precision. In this paper, we present cNeural, a customized parallel computing platform to accelerate training large scale neural networks with the backpropagation algorithm. Unlike many existing parallel neural network training systems working on thousands of training samples, cNeural is designed for fast training large scale datasets with millions of training samples. To achieve this goal, firstly, cNeural adopts HBase for large scale training dataset storage and parallel loading. Secondly, it provides a parallel in-memory computing framework for fast iterative training. Third, we choose a compact, event-driven messaging communication model instead of the heartbeat polling model for instant messaging delivery. Experimental results show that the overhead time cost by data loading and messaging communication is very low in cNeural and cNeural is around 50 times faster than the solution based on Hadoop MapReduce. It also achieves nearly linear scalability and excellent load balancing.

Keywords

backpropagation; iterative methods; learning (artificial intelligence); neural nets; parallel processing; ANNs; HBase; Hadoop MapReduce; artificial neural networks; backpropagation algorithm; cNeural; customized parallel computing platform; data mining; event-driven messaging communication model; fast iterative training; heartbeat polling model; instant messaging delivery; large scale neural network training; large scale training dataset storage; load balancing; parallel in-memory computing framework; parallel neural network training systems; pattern recognition; Biological neural networks; Loading; Neurons; Parallel processing; Training; Training data; big data; distributed storage; fast training; neural network; parallel computing;

fLanguage

English

Publisher

ieee

Conference_Titel

Big Data, 2013 IEEE International Conference on

Conference_Location

Silicon Valley, CA

Type

conf

DOI

10.1109/BigData.2013.6691598

Filename

6691598