Title :
Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor
Author :
Severance, Aaron ; Lemieux, Guy G. F.
Author_Institution :
Univ. of British Columbia, Vancouver, BC, Canada
fDate :
Sept. 29 2013-Oct. 4 2013
Abstract :
Embedded systems frequently use FPGAs to perform highly parallel data processing tasks. However, building such a system usually requires specialized hardware design skills with VHDL or Verilog. Instead, this paper presents the VectorBlox MXP Matrix Processor, an FPGA-based soft processor capable of highly parallel execution. Programmed entirely in C, the MXP is capable of executing data-parallel software algorithms at hardware-like speeds. For example, the MXP running at 200MHz or higher can implement a multi-tap FIR filter and output 1 element per clock cycle. MXP´s parameterized design lets the user specify the amount of parallelism required, ranging from 1 to 128 or more parallel ALUs. Key features of the MXP include a parallel-access scratchpad memory to hold vector data and high-throughput DMA and scatter/gather engines. To provide extreme performance, the processor is expandable with custom vector instructions and custom DMA filters. Finally, the MXP seamlessly ties into existing Altera and Xilinx development flows, simplifying system creation and deployment.
Keywords :
C language; FIR filters; embedded systems; field programmable gate arrays; formal specification; hardware description languages; instruction sets; logic design; microprocessor chips; parallel algorithms; parallel machines; Altera development flow; C programming; FPGA-based soft processor; MXP parameterized design; VHDL; VectorBlox MXP matrix processor; Verilog; Xilinx development flow; custom DMA filters; custom vector instructions; data-parallel software algorithm execution; embedded supercomputing; embedded systems; gather engine; hardware design; hardware-like speed; high-throughput DMA; highly parallel data processing task; highly parallel execution; multitap FIR filter; parallel ALU; parallel-access scratchpad memory; parallelism amount specification; scatter engine; system creation; system deployment; vector data; Clocks; Engines; Field programmable gate arrays; Finite impulse response filters; Hardware; Registers; Vectors;
Conference_Titel :
Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2013 International Conference on
Conference_Location :
Montreal, QC
DOI :
10.1109/CODES-ISSS.2013.6658993