DocumentCode
748133
Title
Custom Floating-Point Unit Generation for Embedded Systems
Author
Chong, Yee Jern ; Parameswaran, Sri
Author_Institution
Sch. of Comput. Sci. & Eng., Univ. of New South Wales, Sydney, NSW
Volume
28
Issue
5
fYear
2009
fDate
5/1/2009 12:00:00 AM
Firstpage
638
Lastpage
650
Abstract
While application-specific instruction-set processors (ASIPs) have allowed designers to create processors with custom instructions to target specific applications, floating-point (FP) units (FPUs) are still instantiated as noncustomizable general-purpose units, which, if underutilized, wastes area and performance. Therefore, there is a need for custom FPUs for embedded systems. To create a custom FPU, the subset of FP instructions that should be implemented in hardware has to be determined. Implementing more instructions in hardware reduces the cycle count of the application but may lead to increased latency if the critical delay of the FPU increases. Therefore, a balance between the hardware-implemented and the software-emulated instructions, which produces the best performance, must be found. In order to find this balance, a rapid design space exploration was performed to explore the tradeoffs between the area and the performance. In order to reduce the area of the custom FPU, it is desirable to merge the datapaths for each of the FP operations so that redundant hardware is minimized. However, FP datapaths are complex and contain components with varying bit widths; hence, sharing components of different bit widths is necessary. This introduces the problem of bit alignment, which involves determining how smaller resources should be aligned within larger resources when merged. A novel algorithm for solving the bit-alignment problem during datapath merging was developed. Our results show that adding more FP hardware does not necessarily equate to lower runtime if the delays associated with the additional hardware overcomes the cycle count reductions. We found that, with the Mediabench applications, datapath merging with bit alignment reduced area by an average of 22.5%, compared with an average of 14.1% without bit alignment. With the Standard Performance Evaluation Corporation (SPEC) CPU2000 FP (CFP2000) applications, datapath merging with bit alignment reduced area- - by an average of 7.6%, compared with an average of 3.9% without bit alignment. The less pronounced improvement with the SPEC CFP2000 benchmarks occurs because the SPEC CFP2000 applications predominantly use double-precision operations only. Therefore, there are fewer resources with different bit widths, which benefit less from bit alignment.
Keywords
application specific integrated circuits; embedded systems; floating point arithmetic; microprocessor chips; CPU2000 FP; Standard Performance Evaluation Corporation; application-specific instruction-set processors; bit-alignment problem; custom floating-point unit generation; custom instructions; datapath merging; embedded systems; hardware-implemented instructions; software-emulated instructions; Bit alignment; floating-point (FP) arithmetic; merging; resource sharing;
fLanguage
English
Journal_Title
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on
Publisher
ieee
ISSN
0278-0070
Type
jour
DOI
10.1109/TCAD.2009.2013999
Filename
4838813
Link To Document