Analysis of high-performance floating-point arithmetic on FPGAs

Author

Govindu, Gokul ; Zhuo, Ling ; Choi, Seonil ; Prasanna, Viktor

Author_Institution

Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA

fYear

2004

fDate

26-30 April 2004

Firstpage

149

Abstract

Summary form only given. FPGAs are increasingly being used in the high performance and scientific computing community to implement floating-point based hardware accelerators. We analyze the floating-point multiplier and adder/subtractor units by considering the number of pipeline stages of the units as a parameter and use throughput/area as the metric. We achieve throughput rates of more than 240 Mhz (200 Mhz) for single (double) precision operations by deeply pipelining the units. To illustrate the impact of the floating-point units on a kernel, we implement a matrix multiplication kernel based on our floating-point units and show that a state-of-the-art FPGA device is capable of achieving about 15 GFLOPS (8 GFLOPS) for the single (double) precision floating-point based matrix multiplication. We also show that FPGAs are capable of achieving up to 6x improvement (for single precision) in terms of the GFLOPS/W (performance per unit power) metric over that of general purpose processors. We then discuss the impact of floating-point units on the design of an energy efficient architecture for the matrix multiply kernel.

Keywords

field programmable gate arrays; floating point arithmetic; matrix multiplication; parallel architectures; pipeline arithmetic; FPGA; adder-subtractor unit; energy efficient architecture; field programmable gate array; floating-point multiplier; hardware accelerator; high-performance floating arithmetic; matrix multiplication kernel; pipeline stages; scientific computing; Delay; Energy efficiency; Field programmable gate arrays; Floating-point arithmetic; Frequency; Kernel; Pipeline processing; Scientific computing; Signal processing algorithms; Throughput;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International

Print_ISBN

0-7695-2132-0

Type

conf

DOI

10.1109/IPDPS.2004.1303135

Filename

1303135