Custom Floating-Point Unit Generation for Embedded Systems

Author

Chong, Yee Jern ; Parameswaran, Sri

Author_Institution

Sch. of Comput. Sci. & Eng., Univ. of New South Wales, Sydney, NSW

Volume

28

Issue

5

fYear

2009

fDate

5/1/2009 12:00:00 AM

Firstpage

638

Lastpage

650

Abstract

While application-specific instruction-set processors (ASIPs) have allowed designers to create processors with custom instructions to target specific applications, floating-point (FP) units (FPUs) are still instantiated as noncustomizable general-purpose units, which, if underutilized, wastes area and performance. Therefore, there is a need for custom FPUs for embedded systems. To create a custom FPU, the subset of FP instructions that should be implemented in hardware has to be determined. Implementing more instructions in hardware reduces the cycle count of the application but may lead to increased latency if the critical delay of the FPU increases. Therefore, a balance between the hardware-implemented and the software-emulated instructions, which produces the best performance, must be found. In order to find this balance, a rapid design space exploration was performed to explore the tradeoffs between the area and the performance. In order to reduce the area of the custom FPU, it is desirable to merge the datapaths for each of the FP operations so that redundant hardware is minimized. However, FP datapaths are complex and contain components with varying bit widths; hence, sharing components of different bit widths is necessary. This introduces the problem of bit alignment, which involves determining how smaller resources should be aligned within larger resources when merged. A novel algorithm for solving the bit-alignment problem during datapath merging was developed. Our results show that adding more FP hardware does not necessarily equate to lower runtime if the delays associated with the additional hardware overcomes the cycle count reductions. We found that, with the Mediabench applications, datapath merging with bit alignment reduced area by an average of 22.5%, compared with an average of 14.1% without bit alignment. With the Standard Performance Evaluation Corporation (SPEC) CPU2000 FP (CFP2000) applications, datapath merging with bit alignment reduced area- - by an average of 7.6%, compared with an average of 3.9% without bit alignment. The less pronounced improvement with the SPEC CFP2000 benchmarks occurs because the SPEC CFP2000 applications predominantly use double-precision operations only. Therefore, there are fewer resources with different bit widths, which benefit less from bit alignment.

Keywords

application specific integrated circuits; embedded systems; floating point arithmetic; microprocessor chips; CPU2000 FP; Standard Performance Evaluation Corporation; application-specific instruction-set processors; bit-alignment problem; custom floating-point unit generation; custom instructions; datapath merging; embedded systems; hardware-implemented instructions; software-emulated instructions; Bit alignment; floating-point (FP) arithmetic; merging; resource sharing;

fLanguage

English

Journal_Title

Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

Publisher

ieee

ISSN

0278-0070

Type

jour

DOI

10.1109/TCAD.2009.2013999

Filename

4838813