DocumentCode :
2950097
Title :
Reconciling specialization and flexibility through compound circuits
Author :
Yehia, Sami ; Girbal, Sylvain ; Berry, Hugues ; Temam, Olivier
Author_Institution :
Embedded Syst. Lab., Thales Res. & Technol.
fYear :
2009
fDate :
14-18 Feb. 2009
Firstpage :
277
Lastpage :
288
Abstract :
While parallelism and multi-cores are receiving much attention as a major scalability path, customization is another, orthogonal and complementary, scalability path which can target not easily parallelizable programs or program sections. The key assets of customization are cost and power efficiency. The key limitation of customization is flexibility. However, we argue that there is no perfect balance between efficiency and flexibility, each system vendor may want to strike a different such balance. In this article, we present a method for achieving any desired balance between flexibility and efficiency by automatically combining any set of individual customization circuits into a larger compound circuit. This circuit is significantly more cost efficient than the simple union of all target circuits, and is configurable to behave as any of the target circuits, while avoiding the routing and configuration cost overhead of FPGAs. The more individual circuits are included, the larger the number of applications which can potentially benefit from this compound customization circuit, realizing flexibility at a minimal cost. Moreover, we observe that the compound circuit cost does not increase in proportion to the number of target applications, due to the wide range of common data-flow and control-flow patterns in programs. Currently, the target individual circuits correspond to loops, like most accelerators in embedded systems, but the aggregation method can accommodate circuits of any size. Using the UTDSP benchmarks and accelerators coupled with an embedded PowerPC405 processor, we show that this approach can yield an average performance improvement of 2.97, while the corresponding synthesized aggregate accelerator is 3 time smaller than the sum of individual accelerators for each target benchmark.
Keywords :
benchmark testing; embedded systems; field programmable gate arrays; microprocessor chips; FPGA; UTDSP benchmarks; accelerators; aggregation method; compound customization circuit; control-flow patterns; data-flow patterns; embedded PowerPC405 processor; embedded systems; individual customization circuits; parallelizable programs; program sections; Aggregates; Circuit synthesis; Costs; Coupling circuits; Embedded system; Field programmable gate arrays; Flexible printed circuits; Proportional control; Routing; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on
Conference_Location :
Raleigh, NC
ISSN :
1530-0897
Print_ISBN :
978-1-4244-2932-5
Type :
conf
DOI :
10.1109/HPCA.2009.4798263
Filename :
4798263
Link To Document :
بازگشت