• DocumentCode
    629675
  • Title

    VASILE: A reconfigurable vector architecture for instruction level frequency scaling

  • Author

    Petrica, Lucian ; Codreanu, Valeriu ; Cotofana, Sorin

  • Author_Institution
    Politeh. Univ. of Bucharest, Bucharest, Romania
  • fYear
    2013
  • fDate
    20-21 June 2013
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Coarse-grained dynamic frequency scaling has been extensively utilised in embedded (multiprocessor) platforms to achieve energy reduction and by implication to extend the autonomy and battery lifetime. In this paper we propose to make use of fine-grained frequency scaling, i.e., adjust the frequency at instruction level, to increase the instruction throughput of a FPGA implemented Vector Processor (VP). We introduce a VP architectural template and an associated design methodology that enables the creation of application requirements tailored VP instances. For each instance, the data-path delays of individual instructions are optimized separately, guided by profiling data corresponding to the target application class, maximizing the performance of frequently utilised instructions to the detriment of those which are less often executed. In this way instructions are divided into clock frequency classes according to their data-path delay and at run time the clock frequency is scaled to the value required by the class of the to be executed instruction. During the application execution different VP instances are dynamically configured in FPGA in order to create the most appropriate hardware support for optimizing the application performance in terms of throughput without increasing power consumption, and therefore reducing energy. As operating frequency changes induce a certain time penalty, which may potentially diminish the actual performance gain, the application code is optimised during the compilation in order to reduce the number of runtime clock switches via, e.g., loop tiling, instruction clustering. We evaluate the effectiveness of the proposed approach on several computational kernels used in image processing applications, i.e., sum of absolute differences, sum of squared differences, and Gaussian filtering. Our results indicate that an average instruction throughput increase of 20%, and a 15 % energy consumption reduction are achieved due to the utilisation of- runtime reconfiguration and fine-grained frequency scaling.
  • Keywords
    clocks; coprocessors; field programmable gate arrays; instruction sets; integrated circuit design; multiprocessing systems; reconfigurable architectures; FPGA implemented vector processor; Gaussian filtering; VASILE; VP architectural template; autonomy lifetime; battery lifetime; clock frequency classes; coarse-grained dynamic frequency scaling; data profiling; data-path delays; design methodology; embedded platforms; energy reduction; fine-grained frequency scaling; image processing applications; instruction clustering; instruction level frequency scaling; instruction throughput; loop tiling; multiprocessor platforms; reconfigurable vector architecture; runtime clock switch number reduction; sum-of-absolute differences; sum-of-squared differences; Clocks; Computer architecture; Delays; Field programmable gate arrays; Performance evaluation; Vector processors; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Faible Tension Faible Consommation (FTFC), 2013 IEEE
  • Conference_Location
    Paris
  • Print_ISBN
    978-1-4673-6105-7
  • Type

    conf

  • DOI
    10.1109/FTFC.2013.6577772
  • Filename
    6577772