Title :
Codevelopment of Multi-level ISA and hardware for an efficient matrix processor
Author :
Soliman, Mostafa I. ; Al-Junaid, Abdulmajid F.
Author_Institution :
Electr. Eng. Dept., South Valley Univ., Aswan, Egypt
Abstract :
The instruction set architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. Multi-level ISA is proposed to explicitly communicate data parallelism to hardware (processor) in a compact way instead of the dynamic extraction using complex hardware or the static extraction using sophisticated compiler techniques. This paper presents the codevelopment of multi-level ISA and hardware for an efficient matrix processor called Mat-Core. Mat-Core extends a general-purpose scalar processor with a matrix unit for processing vector/matrix data. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute scalar-matrix, vector-matrix, and matrix-matrix instructions in addition to scalar-vector and vector-vector instructions. Mat-Core leads to a compiler model that is efficient both in terms of performance and executable code size. On four parallel lanes Mat-Core, our results show performances of about 1.6, 2.1, 4.1, and 6.4 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, vector-matrix multiplication, and matrix-matrix multiplication, respectively.
Keywords :
digital arithmetic; instruction sets; matrix multiplication; parallel architectures; Mat-Core; address generation; compiler model; data computation; data computation unit; data parallelism; data queues; general-purpose scalar processor; matrix data; matrix processor; matrix-matrix instructions; matrix-matrix multiplication; multilevel instruction set architecture; scalar-matrix instructions; scalar-vector instructions; scalar-vector multiplication; vector-matrix instructions; vector-matrix multiplication; vector-vector instructions; Clocks; Computer architecture; Concurrent computing; Data mining; Delay; Hardware; Instruction sets; Parallel processing; Program processors; Programming profession; SystemC implementation; high performance computing; multi-level ISA; performance evaluation; vector/matrix processing;
Conference_Titel :
Computer Engineering & Systems, 2009. ICCES 2009. International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-5842-4
Electronic_ISBN :
978-1-4244-5843-1
DOI :
10.1109/ICCES.2009.5383281