• DocumentCode
    234589
  • Title

    FPGA implementation and evaluation of a simple processor for multi-scalar/vector/matrix instructions

  • Author

    Soliman, Mostafa I. ; Elsayed, Elsayed A.

  • Author_Institution
    Comput. Sci. & Inf. Dept., Taibah Univ., Al-Madinah Al-Munawwarah, Saudi Arabia
  • fYear
    2014
  • fDate
    19-20 April 2014
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    On FPGA, this paper presents the implementation of a simple processor architecture for accelerating data-parallel applications. Our proposed processor called SuperSMP, which can execute multi-scalar, vector, and matrix instructions on parallel execution datapaths. 4×32-bit instructions are fetched from instruction cache. The fetched instructions are decoded and their dependencies are checked. Up to four independent scalar instructions can be issued in-order to the parallel execution units. However, vector/matrix instructions iterate the issuing of four vector/matrix operations without checking. On four parallel execution units, SuperSMP can perform addition, subtraction, multiplication, division, and shifting on scalar/vector/matrix data. 4×32-bit contiguous vector/matrix elements can be loaded/stored per clock cycle from/to L2 cache to/from matrix register file. Finally, up to 4×32-bit results or loaded data can be written into scalar/matrix register files. The FPGA implementation of our proposed SuperSMP requires 14,032 slices on Xilinx Virtex-5, XC5VLX110-3FF1153. The number of LUT flip-flop pairs is 49,398, where 17,166, 10,267, and 21,965, are the numbers of unused flip-flop, unused LUT, and fully used LUT flip-flop pairs, respectively. The complexity of SuperSMP is about 3.5 times of the baseline scalar processor. However, the performance of SuperSMP ranges from 4.3 to 18.2 times higher than the baseline scalar processor.
  • Keywords
    application specific integrated circuits; field programmable gate arrays; flip-flops; integrated logic circuits; table lookup; FPGA implementation; LUT flip-flop pairs; SuperSMP; XC5VLXllO-3FF1l53; Xilinx Virtex-5; addition; baseline scalar processor; data-parallel applications; division; matrix register file; multiplication; multiscalar-vector-matrix instructions; scalar-vector-matrix data shifting; simple processor architecture; subtraction; Field programmable gate arrays; Frequency synthesizers; Kernel; Loading; Parallel processing; Table lookup; Vectors; FPGA; data-parallel applications; performance evaluation; vector/matrix processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering and Technology (ICET), 2014 International Conference on
  • Conference_Location
    Cairo
  • Type

    conf

  • DOI
    10.1109/ICEngTechnol.2014.7016776
  • Filename
    7016776