Title :
A highly scalable Restricted Boltzmann Machine FPGA implementation
Author :
Kim, Sang Kyun ; McAfee, Lawrence C. ; McMahon, Peter L. ; Olukotun, Kunle
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., Stanford, CA, USA
fDate :
Aug. 31 2009-Sept. 2 2009
Abstract :
Restricted Boltzmann machines (RBMs)- the building block for newly popular deep belief networks (DBNs) - are a promising new tool for machine learning practitioners. However, future research in applications of DBNs is hampered by the considerable computation that training requires. In this paper, we describe a novel architecture and FPGA implementation that accelerates the training of general RBMs in a scalable manner, with the goal of producing a system that machine learning researchers can use to investigate ever-larger networks. Our design uses a highly efficient, fully-pipelined architecture based on 16-bit arithmetic for performing RBM training on an FPGA. We show that only 16-bit arithmetic precision is necessary, and we consequently use embedded hardware multiply-and-add (MADD) units. We present performance results to show that a speedup of 25-30X can be achieved over an optimized software implementation on a high-end CPU.
Keywords :
Boltzmann machines; belief networks; field programmable gate arrays; learning (artificial intelligence); multiplying circuits; pipeline arithmetic; 16-bit arithmetic precision; DBN; FPGA implementation; MADD; RBM; deep belief network; embedded hardware multiply-and-add unit; high-end CPU; machine learning; pipeline architecture; restricted boltzmann machine; software implementation; Acceleration; Arithmetic; Artificial intelligence; Computer architecture; Field programmable gate arrays; Machine learning; Modems; Neural networks; Neurons; Space exploration;
Conference_Titel :
Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4244-3892-1
Electronic_ISBN :
1946-1488
DOI :
10.1109/FPL.2009.5272262