DocumentCode :
632860
Title :
Loosely or tightly coupled affinity for matrix - Vector multiplication
Author :
Velkoski, Goran ; Ristov, Sasko ; Gusev, Marjan
Author_Institution :
Fac. of Inf. Sci. & Comput. Eng., Ss. Cyril & Methodius Univ., Skopje, Macedonia
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
228
Lastpage :
233
Abstract :
Today´s CPU cores usually possess private L1 and L2 cache and share L3 cache with other cores of the chip (die). Private or shared cache could have significant impact to the algorithm performance in parallel implementation, i.e. using tightly coupled CPU cores with the same last level L3 cache, or loosely coupled CPU cores with private L3 cache per chip. Private cache increases the overall cache size used during the execution. On the other side, shared cache provide implicit prefetching of the data reducing cache misses if all CPU cores of the chip use the same data. In this paper we analyze the matrix vector multiplication (MVM) algorithm performance represented with speed and speedup. We realize sequential and parallel implementation in multi-chip multi-core multiprocessor in order to determine the CPU affinity that provides the best performance for parallel implementation using the same number of tightly coupled CPU cores and their counterparts - loosely coupled CPU cores. The results show that working on loosely coupled cores with private L3 cache is better than working on tightly cores with shared last level L3 cache in the region where the problem size can be stored in the total L3 cache of loosely coupled CPU cores, but in the same time cannot be placed in only one L3 cache of tightly coupled CPU cores.
Keywords :
cache storage; coprocessors; matrix multiplication; microprocessor chips; parallel memories; shared memory systems; MVM algorithm performance; data execution; data prefetching; loosely coupled CPU core affinity; matrix vector multiplication; multichip multicore multiprocessor; parallel implementation; private cache; sequential implementation; shared cache storage; tightly coupled CPU core affinity; Artificial neural networks; Cache memory; Instruction sets; Multicore processing; Testing; Vectors; Gustafson´s law; high performance computing; shared memory multiprocessor;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information & Communication Technology Electronics & Microelectronics (MIPRO), 2013 36th International Convention on
Conference_Location :
Opatija
Print_ISBN :
978-953-233-076-2
Type :
conf
Filename :
6596257
Link To Document :
بازگشت