Title :
Cache performance and algorithm optimization
Author_Institution :
Inst. of Comput. Technol., Acad. Sinica, Beijing, China
fDate :
28 Apr-2 May 1997
Abstract :
A technique to enhance the cache performance of some blocked algorithms is proposed. According to the results of number theory, the author presents a principle for array padding so that accesses of array subblocks do not generate conflict misses. The technique is used to calculate LU factorization and matrix multiplication. The principle is tested on a shared memory multiprocessor. The practical results agree with the theoretical analysis, and 20% to 150% increasing in performance is achieved
Keywords :
cache storage; matrix multiplication; number theory; optimisation; reduced instruction set computing; shared memory systems; subroutines; LU factorization; algorithm optimization; array padding; array subblock access; blocked algorithms; cache performance; matrix multiplication; number theory; performance; shared memory multiprocessor; Computers; Concurrent computing; Data structures; Delay; Optimization methods; Parallel algorithms; Performance analysis; Reduced instruction set computing; System testing;
Conference_Titel :
High Performance Computing on the Information Superhighway, 1997. HPC Asia '97
Conference_Location :
Seoul
Print_ISBN :
0-8186-7901-8
DOI :
10.1109/HPC.1997.592114