مرکز منطقه ای اطلاع رساني علوم و فناوري - Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems

DocumentCode :

1199493

Title :

Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems

Author :

Zhuo, Ling ; Prasanna, Viktor K.

Author_Institution :

Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA

Volume :

Issue :

fYear :

2007

fDate :

4/1/2007 12:00:00 AM

Firstpage :

433

Lastpage :

448

Abstract :

The abundant hardware resources on current reconfigurable computing systems provide new opportunities for high-performance parallel implementations of scientific computations. In this paper, we study designs for floating-point matrix multiplication, a fundamental kernel in a number of scientific applications, on reconfigurable computing systems. We first analyze design trade-offs in implementing this kernel. These trade-offs are caused by the inherent parallelism of matrix multiplication and the resource constraints, including the number of configurable slices, the size of on-chip memory, and the available memory bandwidth. We propose three parameterized algorithms which can be tuned according to the problem size and the available hardware resources. Our algorithms employ linear array architecture with simple control logic. This architecture effectively utilizes the available resources and reduces routing complexity. The processing elements (PEs) used in our algorithms are modular so that it is easy to embed floating-point units into them. Experimental results on a Xilinx Virtex-ll Pro XC2VP100 show that our algorithms achieve good scalability and high sustained GFLOPS performance. We also implement our algorithms on Cray XD1. XD1 is a high-end reconfigurable computing system that employs both general-purpose processors and reconfigurable devices. Our algorithms achieve a sustained performance of 2.06 GFLOPS on a single node of XD1

Keywords :

field programmable gate arrays; floating point arithmetic; matrix multiplication; natural sciences computing; parallel algorithms; reconfigurable architectures; control logic; field-programmable gate arrays; floating-point matrix multiplication; hardware resource constraint; linear array architecture; on-chip memory bandwidth; parameterized algorithm; reconfigurable computing system; scientific computing; Bandwidth; Concurrent computing; Field programmable gate arrays; Hardware; Kernel; Logic arrays; Parallel algorithms; Parallel processing; Routing; Scientific computing; Scientific computing; computations on matrices; field-programmable gate arrays; parallel algorithms.; reconfigurable hardware;

fLanguage :

English

Journal_Title :

Parallel and Distributed Systems, IEEE Transactions on

Publisher :

ieee

ISSN :

1045-9219

Type :

jour

DOI :

10.1109/TPDS.2007.1001

Filename :

4118686

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1199493