DocumentCode :
1952234
Title :
A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines
Author :
Cadambi, Srihari ; Durdanovic, Igor ; Jakkula, Venkata ; Sankaradass, Murugan ; Cosatto, Eric ; Chakradhar, Srimat ; Graf, Hans Peter
Author_Institution :
NEC Labs. America, Inc., Princeton, NJ, USA
fYear :
2009
fDate :
5-7 April 2009
Firstpage :
115
Lastpage :
122
Abstract :
We present a massively parallel FPGA-based coprocessor for Support Vector Machines (SVMs), a machine learning algorithm whose applications include recognition tasks such as learning scenes, situations and concepts, and reasoning tasks such as analyzing the recognized scenes and semantics. The coprocessor architecture, targeted at both SVM training and classification, is based on clusters of vector processing elements (VPEs) operating in single-instruction multiple data (SIMD) mode to take advantage of large amounts of data parallelism in the application. We use the FPGA´s DSP elements as parallel multiply-accumulators (MACs), a core computation in SVMs. A key feature of the architecture is that it is customized to low precision arithmetic which permits one DSP unit to perform two or more MACs in parallel. Low precision also reduces the required number of parallel off-chip memory accesses by packing multiple data words on the FPGA-memory bus. We have built a prototype using an off-the-shelf PCI-based FPGA card with a Xilinx Virtex 5 FPGA and 1 GB DDR2 memory. For SVM training, we observe application-level end-to-end computation speeds of over 9 billion multiply-accumulates per second (GMACs). For SVM classification, using data packing, the application speed increases to 14 GMACs. The FPGA-based system is about 20times faster than a dual Opteron 2.2 GHz processor CPU, and dissipates around 10 W of power.
Keywords :
coprocessors; field programmable gate arrays; learning (artificial intelligence); parallel architectures; peripheral interfaces; support vector machines; vector processor systems; DDR2 memory; DSP unit; FPGA memory bus; SVM; Xilinx Virtex 5 FPGA; coprocessor architecture; data packing; data parallelism; dual Opteron 2.2 GHz processor CPU; end-to-end computation; low precision arithmetic; machine learning algorithm; massively parallel FPGA-based coprocessor; off-the-shelf PCI based FPGA card; parallel multiply-accumulators; parallel off-chip memory access; recognized scenes analysis; scene learning; sementics analysis; single-instruction multiple data; support vector machine training; vector processing elements; Algorithm design and analysis; Computer architecture; Coprocessors; Digital signal processing; Field programmable gate arrays; Layout; Machine learning; Machine learning algorithms; Support vector machine classification; Support vector machines; FPGAs; Hardware Acceleration; Machine Learning; Parallel Architectures; Support Vector Machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Field Programmable Custom Computing Machines, 2009. FCCM '09. 17th IEEE Symposium on
Conference_Location :
Napa, CA
Print_ISBN :
978-0-7695-3716-0
Type :
conf
DOI :
10.1109/FCCM.2009.34
Filename :
5290941
Link To Document :
بازگشت