مرکز منطقه ای اطلاع رساني علوم و فناوري - Loosely or tightly coupled affinity for matrix

DocumentCode :

632860

Title :

Loosely or tightly coupled affinity for matrix - Vector multiplication

Author :

Velkoski, Goran ; Ristov, Sasko ; Gusev, Marjan

Author_Institution :

Fac. of Inf. Sci. & Comput. Eng., Ss. Cyril & Methodius Univ., Skopje, Macedonia

fYear :

2013

fDate :

20-24 May 2013

Firstpage :

228

Lastpage :

233

Abstract :

Today´s CPU cores usually possess private L1 and L2 cache and share L3 cache with other cores of the chip (die). Private or shared cache could have significant impact to the algorithm performance in parallel implementation, i.e. using tightly coupled CPU cores with the same last level L3 cache, or loosely coupled CPU cores with private L3 cache per chip. Private cache increases the overall cache size used during the execution. On the other side, shared cache provide implicit prefetching of the data reducing cache misses if all CPU cores of the chip use the same data. In this paper we analyze the matrix vector multiplication (MVM) algorithm performance represented with speed and speedup. We realize sequential and parallel implementation in multi-chip multi-core multiprocessor in order to determine the CPU affinity that provides the best performance for parallel implementation using the same number of tightly coupled CPU cores and their counterparts - loosely coupled CPU cores. The results show that working on loosely coupled cores with private L3 cache is better than working on tightly cores with shared last level L3 cache in the region where the problem size can be stored in the total L3 cache of loosely coupled CPU cores, but in the same time cannot be placed in only one L3 cache of tightly coupled CPU cores.

Keywords :

cache storage; coprocessors; matrix multiplication; microprocessor chips; parallel memories; shared memory systems; MVM algorithm performance; data execution; data prefetching; loosely coupled CPU core affinity; matrix vector multiplication; multichip multicore multiprocessor; parallel implementation; private cache; sequential implementation; shared cache storage; tightly coupled CPU core affinity; Artificial neural networks; Cache memory; Instruction sets; Multicore processing; Testing; Vectors; Gustafson´s law; high performance computing; shared memory multiprocessor;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information & Communication Technology Electronics & Microelectronics (MIPRO), 2013 36th International Convention on

Conference_Location :

Opatija

Print_ISBN :

978-953-233-076-2

Type :

conf

Filename :

6596257

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=632860