Author_Institution :
Electr. Eng. Dept., South Valley Univ., Aswan, Egypt
Abstract :
Data-parallel kernels dominate the computational workload in a wide variety of demanding applications. Since the fundamental data structures for a wide variety of data-parallel applications are scalar, vector, and matrix, this paper proposes a simple matrix processor (SMP) for executing scalar/vector/matrix instructions. Instead of using accelerators to improve the performance of data-parallel applications, SMP uses multi-level ISA to express parallelism to common hardware. SMP extends the well known 5-stage pipeline with matrix register file and matrix control unit in the decode stage. Scalar/vector/matrix instructions are fetched from instruction cache, decoded, and executed on the same execution datapath. On Xilinx Virtex-5 FPGA targeting xc5vlx50-3ff1153 device, SMP requires 4,138 slices, where the number of slice flip flops is 5,853 and the number of 4 input LUTs is 12,840: 12,540 for logic and 300 for RAMs. Moreover, the FPGA implementation of SMP operates at 108 MHz. Our results show speedup of 2.84, 3.82, 3.88, and 7.43 times over scalar execution on SAXPY, vector addition, vector scaling, and matrix-matrix multiplication, respectively.
Keywords :
data structures; field programmable gate arrays; matrix algebra; FPGA implementation; SMP; Xilinx Virtex-5 FPGA; data parallel applications; data parallel kernels; data structures; matrix control unit; matrix register file; scalar/vector/matrix instructions; simplified matrix processor; xc5vlx50-3ff1153 device; FPGA implementation; data-parallel applications; multi-level ISA; pipelining; vector/matrix processing;