DocumentCode :
3355058
Title :
Optimizing memory accesses for spatial computation
Author :
Seth, M.B. ; Goldstein, Seth C.
Author_Institution :
Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
2003
fDate :
23-26 March 2003
Firstpage :
216
Lastpage :
227
Abstract :
We present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-based representation for memory, which compactly summarizes both control-flow- and dependence information. In CASH, memory optimization is a four-step process: (1) first an initial, relatively coarse representation of memory dependences is built; (2) next, unnecessary memory dependences are removed using dependence tests; (3) third, redundant memory operations are removed (4) finally, parallelism is increased by pipelining memory accesses in loops. While the first three steps above are very general, the loop pipelining transformations are particularly applicable for spatial computation, which is the primary target of CASH. The redundant memory removal optimizations presented are: load/store hoisting (subsuming partial redundancy elimination and common-subexpression elimination), load-after-store removal, store-before-store removal (dead store removal) and loop-invariant load motion. One of our loop pipelining transformations is a new form of loop parallelization, called loop decoupling. This transformation separates independent memory accesses within a loop body into several independent loops, which are allowed dynamically to slip with respect to each other A new computational primitive, a token generator is used to dynamically control the amount of slip, allowing maximum freedom, while guaranteeing that no memory dependences are violated.
Keywords :
optimising compilers; program control structures; storage management; CASH compiler; SSA-based representation; common-subexpression elimination; computational primitive; control-flow information; dead store removal; dependence information; dependence tests; independent loops; independent memory accesses; initial coarse representation; internal representation; load-after-store removal; load/store hoisting; loop body; loop decoupling; loop parallelization; loop pipelining transformations; loop-invariant load motion; memory access optimization; memory dependences; memory optimization; memory parallelism; pipelining memory accesses; pointer-based programs; primary target; redundant memory operations; redundant memory removal optimizations; spatial computation; store-before-store removal; subsuming partial redundancy elimination; token generator; Asynchronous circuits; Computer science; Hardware; High level languages; Optimizing compilers; Parallel processing; Pipeline processing; Program processors; Testing; Wires;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Code Generation and Optimization, 2003. CGO 2003. International Symposium on
Print_ISBN :
0-7695-1913-X
Type :
conf
DOI :
10.1109/CGO.2003.1191547
Filename :
1191547
Link To Document :
بازگشت