Title :
Combining optimization for cache and instruction-level parallelism
Author_Institution :
Dept. of Comput. Sci., Michigan Technol. Univ., Houghton, MI, USA
Abstract :
Current architectural trends in instruction-level parallelism (ILP) have significantly increased the computational power of microprocessors. As a result, the demands on the memory system have increased dramatically. Not only do compilers need to be concerned with finding ILP to utilize machine resources effectively, but they also need to be concerned with ensuring that the resulting code has a high degree of cache locality. Previous work has concentrated either on improving ILP in nested leaps or on improving cache performance. This paper presents a performance metric that can be used to guide the optimization of nested loops considering the combined effects of ILP, data reuse and latency hiding techniques. We have implemented the technique in a source-to-source transformation system called Memoria. Preliminary experiments reveal that dramatic performance improvements for nested loops are obtainable (we regularly get at least a factor of 2 on kernels run on two different architectures)
Keywords :
cache storage; computer architecture; instruction sets; memory architecture; microprocessor chips; Memoria; architectural trends; cache; compilers; data reuse; instruction-level parallelism; latency hiding techniques; microprocessors; nested loops; optimization; performance metric; source-to-source transformation system; Computer aided instruction; Computer science; Delay; Kernel; Microprocessors; Optimization methods; Parallel processing; Pipeline processing; Software measurement; Tiles;
Conference_Titel :
Parallel Architectures and Compilation Techniques, 1996., Proceedings of the 1996 Conference on
Conference_Location :
Boston, MA
Print_ISBN :
0-8186-7633-7
DOI :
10.1109/PACT.1996.552672