Author :
Bodnar, M.R. ; Humphrey, J.R. ; Curt, Petersen F. ; Prather, Dennis W.
Abstract :
Many scientific algorithms require floating-point reduction operations, or accumulations, including matrix-vector-multiply (MVM), vector dot-products, and the discrete cosine transform (DCT). Because FPGA implementations of each of these algorithms are desirable, it is clear that a high-performance, floatingpoint accumulation unit is necessary. However, this type of circuit is difficult to design in an FPGA environment due to the deep pipelining of the floatingpoint arithmetic units, which is needed in order to attain high performance designs (Durbano et al., 2004, Leeser and Wang, 2004). A deep pipeline requires special handling in feedback circuits because of the long delay, which is further complicated by a continuous input data stream. Proposed accumulator architectures, which overcome such performance bottlenecks, are described in Zuo et al. (2005) and Zuo and Prassana (2005). This paper presents a floating-point accumulation circuit that is a natural evolution of this work. The system can handle streams of arbitrary length, requires modest area, and can handle interrupted data inputs. In contrast to the designs proposed by Zhuo et al., the proposed architecture maintains buffers for partial result storage which utilize significantly less embedded memory resources, while maintaining fixed size and speed characteristics, regardless of stream length. The results for both single- and double-precision accumulation architectures was verified in a Virtex-II 8000-4 part clocked at more than 150 MHz, and the power of this design was demonstrated in a computationally intense, matrix-matrix-multiply application
Keywords :
discrete cosine transforms; floating point arithmetic; logic circuits; matrix algebra; pipeline processing; Virtex-II 8000-4; accumulator architectures; floating point arithmetic; floating-point accumulation circuit; floating-point reduction; interrupted data; matrix applications; matrix-matrix-multiply; scientific algorithms; Arithmetic; Buffer storage; Clocks; Computer applications; Computer architecture; Delay; Discrete cosine transforms; Feedback circuits; Field programmable gate arrays; Pipeline processing;