Title :
Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures
Author :
Dajiang Liu;Shouyi Yin;Leibo Liu;Shaojun Wei
Author_Institution :
Inst. of Microelectron., Tsinghua Univ., Beijing, China
fDate :
5/1/2014 12:00:00 AM
Abstract :
A coarse-grained reconfigurable architecture is a promising architecture with high power efficiency, which is typically composed of a host controller and a processing element array (PEA). Loops are often mapped onto PEAs for acceleration. In previous work, innermost loop is pipelined, and the the maximal number of concurrently executable operators (CEOs) in the kernel is limited by the inner loop. The loop body DFG of the input 2D nested loop with a inner loop carried dependence ([0,1]) and outer loop carried dependence ([1,1]). We would map this loop onto a 4×4 PEA with pipelining. We assume that the latency of executing one loop iteration is Lb, and the number of iterations involved at one cycle in the kernel phase of pipelining is Wk. As there is a inner loop dependence ([0,1]), the initiation interval (IIi) of inner loop pipelining could be minimized to 1 and we get Wk = 4. We also note that the angle α is contained by two sides in Figure 1(b), which could be written as follow: tan(α) = Wk/Lb = 1/IIi.
Keywords :
"Kernel","Pipeline processing","Arrays","Microelectronics","Educational institutions"
Conference_Titel :
Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on
DOI :
10.1109/FCCM.2014.19