VASILE: A reconfigurable vector architecture for instruction level frequency scaling

Author

Petrica, Lucian ; Codreanu, Valeriu ; Cotofana, Sorin

Author_Institution

Politeh. Univ. of Bucharest, Bucharest, Romania

fYear

2013

fDate

20-21 June 2013

Firstpage

Lastpage

Abstract

Coarse-grained dynamic frequency scaling has been extensively utilised in embedded (multiprocessor) platforms to achieve energy reduction and by implication to extend the autonomy and battery lifetime. In this paper we propose to make use of fine-grained frequency scaling, i.e., adjust the frequency at instruction level, to increase the instruction throughput of a FPGA implemented Vector Processor (VP). We introduce a VP architectural template and an associated design methodology that enables the creation of application requirements tailored VP instances. For each instance, the data-path delays of individual instructions are optimized separately, guided by profiling data corresponding to the target application class, maximizing the performance of frequently utilised instructions to the detriment of those which are less often executed. In this way instructions are divided into clock frequency classes according to their data-path delay and at run time the clock frequency is scaled to the value required by the class of the to be executed instruction. During the application execution different VP instances are dynamically configured in FPGA in order to create the most appropriate hardware support for optimizing the application performance in terms of throughput without increasing power consumption, and therefore reducing energy. As operating frequency changes induce a certain time penalty, which may potentially diminish the actual performance gain, the application code is optimised during the compilation in order to reduce the number of runtime clock switches via, e.g., loop tiling, instruction clustering. We evaluate the effectiveness of the proposed approach on several computational kernels used in image processing applications, i.e., sum of absolute differences, sum of squared differences, and Gaussian filtering. Our results indicate that an average instruction throughput increase of 20%, and a 15 % energy consumption reduction are achieved due to the utilisation of- runtime reconfiguration and fine-grained frequency scaling.

Keywords

clocks; coprocessors; field programmable gate arrays; instruction sets; integrated circuit design; multiprocessing systems; reconfigurable architectures; FPGA implemented vector processor; Gaussian filtering; VASILE; VP architectural template; autonomy lifetime; battery lifetime; clock frequency classes; coarse-grained dynamic frequency scaling; data profiling; data-path delays; design methodology; embedded platforms; energy reduction; fine-grained frequency scaling; image processing applications; instruction clustering; instruction level frequency scaling; instruction throughput; loop tiling; multiprocessor platforms; reconfigurable vector architecture; runtime clock switch number reduction; sum-of-absolute differences; sum-of-squared differences; Clocks; Computer architecture; Delays; Field programmable gate arrays; Performance evaluation; Vector processors; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Faible Tension Faible Consommation (FTFC), 2013 IEEE

Conference_Location

Paris

Print_ISBN

978-1-4673-6105-7

Type

conf

DOI

10.1109/FTFC.2013.6577772

Filename

6577772

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=629675