Title :
Enabling loop fusion and tiling for cache performance by fixing fusion-preventing data dependences
Author :
Xue, Jingling ; Huang, Qingguang ; Guo, Minyi
Author_Institution :
Sch. of Comput. Sci. & Eng., New South Wales Univ., Sydney, NSW, Australia
Abstract :
This paper presents a new approach to enabling loop fusion and tiling for arbitrary affine loop nests. Given a set of multiple loop nests, we present techniques that automatically eliminate all the fusion-preventing dependences by means of loop tiling and array copying. Applying our techniques iteratively to multiple loop nests yields a single loop nest that can be tiled for cache locality. Our approach handles LU, QR, Cholesky and Jacobi in a unified framework. Our experimental evaluation on an SGI Octane2 system shows that the benefit from the significantly reduced L1 and L2 cache misses has far more than offset the branching and loop control overhead introduced by our approach.
Keywords :
cache storage; parallel processing; program control structures; SGI Octane2 system; arbitrary affine loop nests; array copying; cache performance; data dependences; loop fusion; loop tiling; Australia; Computer science; Control systems; Data engineering; Iterative algorithms; Jacobian matrices; Kernel; Tiles;
Conference_Titel :
Parallel Processing, 2005. ICPP 2005. International Conference on
Print_ISBN :
0-7695-2380-3
DOI :
10.1109/ICPP.2005.37