Title :
Performance impact of reconfigurable L1 cache on GPU devices
Author :
Ristov, Sasko ; Gusev, Marjan ; Djinevski, Leonid ; Arsenovski, Sime
Author_Institution :
Ss. Cyril & Methodius Univ., Skopje, Macedonia
Abstract :
The newest GPU Kepler architecture offers a reconfigurable L1 cache per Streaming Multiprocessor with different cache size and cache associativity. Both these cache parameters affect the overall performance of cache intensive algorithms, i.e. the algorithms which intensively reuse the data. In this paper, we analyze the impact of different configurations of L1 cache on execution of matrix multiplication algorithm for different problem sizes. The basis of our research is the existing theoretical analysis of performance drawbacks which appear for matrix multiplication while executed on multicore CPU. We perform series of experiments to analyze the matrix multiplication execution behavior on GPU and its set associative L1 and L2 cache memory with three different configurations: cache size of 16KB, 32KB and 48KB with appropriate set associativity of 4 and 6, respectively. The results show that only L2 cache impacts the algorithm´s overall performance, particularly the L2 capacity and set associativity. However, the configuration of the L1 cache with 48KB and 6-way set associativity slightly reduces these performance drawbacks, compared to other configurations of L1 with 32KB and 16KB using 4-way cache set associativity, due to greater set associativity.
Keywords :
cache storage; graphics processing units; matrix multiplication; multiprocessing systems; performance evaluation; 4-way cache set associativity; GPU Kepler architecture; GPU devices; cache associativity; cache intensive algorithms; cache parameters; cache size; matrix multiplication algorithm; matrix multiplication execution behavior; memory size 16 KByte; memory size 32 KByte; memory size 48 KByte; multicore CPU; performance impact; reconfigurable L1 cache; set associative L1 cache memory; set associative L2 cache memory; streaming multiprocessor; Algorithm design and analysis; Cache memory; Graphics processing units; Market research; Multicore processing; Performance evaluation; Cache Memory; GPGPU; Set Associativity;
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on
Conference_Location :
Krako??w