Title :
Combining Instruction and Loop Level Parallelism for FPGAs
Author :
Derrien, S. ; Rajopadhye, S. ; Sur-Kolay, S.
Author_Institution :
IRISA
fDate :
March 29 2001-April 2 2001
Abstract :
The conventional method of compiling perfect loops of uniform dependence programs to FPGA based co-processors yields PE arrays where a processor (PE) executes one instance of the loop body per clock cycle. We develop a transformation framework in which the derived PE can be systematically and automatically pipelined through retiming. We use well known transformations, namely skewing and serialization, which enable us to place an arbitrary number of registers at the PE outputs, which are then moved in to the PE´s data-path using circuit retimers provided by commercial CAD tools. Our experimental measurements (based on performance estimates after place-and-route) have been very encouraging. For a number of examples we have seen dramatic performance improvements, speed increases of an order of magnitude with relatively little (always less than 50%) area overhead.
Conference_Titel :
Field-Programmable Custom Computing Machines, 2001. FCCM '01. The 9th Annual IEEE Symposium on
Conference_Location :
Rohnert Park, CA, USA
Print_ISBN :
0-7695-2667-5