• DocumentCode
    1559519
  • Title

    Automatic compilation of loops to exploit operator parallelism on configurable arithmetic logic units

  • Author

    Ramasubramanian, Narasimhan ; Subramanian, Ram ; Pande, Santosh

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • Volume
    13
  • Issue
    1
  • fYear
    2002
  • fDate
    1/1/2002 12:00:00 AM
  • Firstpage
    45
  • Lastpage
    66
  • Abstract
    Configurable arithmetic logic units (ALUs) offer opportunities for adapting the underlying hardware to support the varying amount of parallelism in the computation. The problem of identifying the optimal parallel configurations (a configuration is defined as a given hardware implementation of different operators along with their multiplicities) at different steps in a program is a very complex issue but, if solved, allows the power of these ALUs to be maximally used. This paper focuses on developing an automatic compilation framework for configuration analysis to exploit operator parallelism within loop nests. The focus of this work is on performing configuration analysis to minimize costly reconfiguration overheads. In our framework, we initially carry out some operator and loop transformations to expose more opportunities for configuration reuse. We then present a two pass solution. The first pass attempts to generate either maximal cutsets (a cutset is defined as a group of statements that execute under a given configuration) or maximally parallel configurations by performing an analysis on the program dependency graph (PDG) of a loop nest. The second pass analyzes the trade-offs between the costs and benefits of reconfigurations across different cutsets and attempts to eliminate the reconfiguration overheads by merging cutsets. This methodology is implemented in the SUIF compilation system and is tested using some loops extracted from Perfect benchmarks and Livermore kernels. Good speedups are obtained, showing the merit of the proposed method. The method also scales well with the loop sizes and the amount of space available on FPGAs for configurable logic
  • Keywords
    field programmable gate arrays; parallel programming; program compilers; program control structures; reconfigurable architectures; FPGAs; Livermore kernels; Perfect benchmarks; SUIF compilation system; automatic compilation loops; configurable arithmetic logic units; configurable logic; configuration loop; configuration operator; loop nests; maximal cutsets; operator parallelism; optimal parallel configurations; program dependency graph; Arithmetic; Benchmark testing; Concurrent computing; Cost benefit analysis; Hardware; Logic; Merging; Parallel processing; Performance analysis; System testing;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.980026
  • Filename
    980026