مرکز منطقه ای اطلاع رساني علوم و فناوري - FPGA implementation and evaluation of a simple processor for multi-scalar/vector/matrix instructions

DocumentCode :

234589

Title :

FPGA implementation and evaluation of a simple processor for multi-scalar/vector/matrix instructions

Author :

Soliman, Mostafa I. ; Elsayed, Elsayed A.

Author_Institution :

Comput. Sci. & Inf. Dept., Taibah Univ., Al-Madinah Al-Munawwarah, Saudi Arabia

fYear :

2014

fDate :

19-20 April 2014

Firstpage :

Lastpage :

Abstract :

On FPGA, this paper presents the implementation of a simple processor architecture for accelerating data-parallel applications. Our proposed processor called SuperSMP, which can execute multi-scalar, vector, and matrix instructions on parallel execution datapaths. 4×32-bit instructions are fetched from instruction cache. The fetched instructions are decoded and their dependencies are checked. Up to four independent scalar instructions can be issued in-order to the parallel execution units. However, vector/matrix instructions iterate the issuing of four vector/matrix operations without checking. On four parallel execution units, SuperSMP can perform addition, subtraction, multiplication, division, and shifting on scalar/vector/matrix data. 4×32-bit contiguous vector/matrix elements can be loaded/stored per clock cycle from/to L2 cache to/from matrix register file. Finally, up to 4×32-bit results or loaded data can be written into scalar/matrix register files. The FPGA implementation of our proposed SuperSMP requires 14,032 slices on Xilinx Virtex-5, XC5VLX110-3FF1153. The number of LUT flip-flop pairs is 49,398, where 17,166, 10,267, and 21,965, are the numbers of unused flip-flop, unused LUT, and fully used LUT flip-flop pairs, respectively. The complexity of SuperSMP is about 3.5 times of the baseline scalar processor. However, the performance of SuperSMP ranges from 4.3 to 18.2 times higher than the baseline scalar processor.

Keywords :

application specific integrated circuits; field programmable gate arrays; flip-flops; integrated logic circuits; table lookup; FPGA implementation; LUT flip-flop pairs; SuperSMP; XC5VLXllO-3FF1l53; Xilinx Virtex-5; addition; baseline scalar processor; data-parallel applications; division; matrix register file; multiplication; multiscalar-vector-matrix instructions; scalar-vector-matrix data shifting; simple processor architecture; subtraction; Field programmable gate arrays; Frequency synthesizers; Kernel; Loading; Parallel processing; Table lookup; Vectors; FPGA; data-parallel applications; performance evaluation; vector/matrix processing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Engineering and Technology (ICET), 2014 International Conference on

Conference_Location :

Cairo

Type :

conf

DOI :

10.1109/ICEngTechnol.2014.7016776

Filename :

7016776

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=234589