• DocumentCode
    1952234
  • Title

    A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines

  • Author

    Cadambi, Srihari ; Durdanovic, Igor ; Jakkula, Venkata ; Sankaradass, Murugan ; Cosatto, Eric ; Chakradhar, Srimat ; Graf, Hans Peter

  • Author_Institution
    NEC Labs. America, Inc., Princeton, NJ, USA
  • fYear
    2009
  • fDate
    5-7 April 2009
  • Firstpage
    115
  • Lastpage
    122
  • Abstract
    We present a massively parallel FPGA-based coprocessor for Support Vector Machines (SVMs), a machine learning algorithm whose applications include recognition tasks such as learning scenes, situations and concepts, and reasoning tasks such as analyzing the recognized scenes and semantics. The coprocessor architecture, targeted at both SVM training and classification, is based on clusters of vector processing elements (VPEs) operating in single-instruction multiple data (SIMD) mode to take advantage of large amounts of data parallelism in the application. We use the FPGA´s DSP elements as parallel multiply-accumulators (MACs), a core computation in SVMs. A key feature of the architecture is that it is customized to low precision arithmetic which permits one DSP unit to perform two or more MACs in parallel. Low precision also reduces the required number of parallel off-chip memory accesses by packing multiple data words on the FPGA-memory bus. We have built a prototype using an off-the-shelf PCI-based FPGA card with a Xilinx Virtex 5 FPGA and 1 GB DDR2 memory. For SVM training, we observe application-level end-to-end computation speeds of over 9 billion multiply-accumulates per second (GMACs). For SVM classification, using data packing, the application speed increases to 14 GMACs. The FPGA-based system is about 20times faster than a dual Opteron 2.2 GHz processor CPU, and dissipates around 10 W of power.
  • Keywords
    coprocessors; field programmable gate arrays; learning (artificial intelligence); parallel architectures; peripheral interfaces; support vector machines; vector processor systems; DDR2 memory; DSP unit; FPGA memory bus; SVM; Xilinx Virtex 5 FPGA; coprocessor architecture; data packing; data parallelism; dual Opteron 2.2 GHz processor CPU; end-to-end computation; low precision arithmetic; machine learning algorithm; massively parallel FPGA-based coprocessor; off-the-shelf PCI based FPGA card; parallel multiply-accumulators; parallel off-chip memory access; recognized scenes analysis; scene learning; sementics analysis; single-instruction multiple data; support vector machine training; vector processing elements; Algorithm design and analysis; Computer architecture; Coprocessors; Digital signal processing; Field programmable gate arrays; Layout; Machine learning; Machine learning algorithms; Support vector machine classification; Support vector machines; FPGAs; Hardware Acceleration; Machine Learning; Parallel Architectures; Support Vector Machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Field Programmable Custom Computing Machines, 2009. FCCM '09. 17th IEEE Symposium on
  • Conference_Location
    Napa, CA
  • Print_ISBN
    978-0-7695-3716-0
  • Type

    conf

  • DOI
    10.1109/FCCM.2009.34
  • Filename
    5290941