• DocumentCode
    632860
  • Title

    Loosely or tightly coupled affinity for matrix - Vector multiplication

  • Author

    Velkoski, Goran ; Ristov, Sasko ; Gusev, Marjan

  • Author_Institution
    Fac. of Inf. Sci. & Comput. Eng., Ss. Cyril & Methodius Univ., Skopje, Macedonia
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    228
  • Lastpage
    233
  • Abstract
    Today´s CPU cores usually possess private L1 and L2 cache and share L3 cache with other cores of the chip (die). Private or shared cache could have significant impact to the algorithm performance in parallel implementation, i.e. using tightly coupled CPU cores with the same last level L3 cache, or loosely coupled CPU cores with private L3 cache per chip. Private cache increases the overall cache size used during the execution. On the other side, shared cache provide implicit prefetching of the data reducing cache misses if all CPU cores of the chip use the same data. In this paper we analyze the matrix vector multiplication (MVM) algorithm performance represented with speed and speedup. We realize sequential and parallel implementation in multi-chip multi-core multiprocessor in order to determine the CPU affinity that provides the best performance for parallel implementation using the same number of tightly coupled CPU cores and their counterparts - loosely coupled CPU cores. The results show that working on loosely coupled cores with private L3 cache is better than working on tightly cores with shared last level L3 cache in the region where the problem size can be stored in the total L3 cache of loosely coupled CPU cores, but in the same time cannot be placed in only one L3 cache of tightly coupled CPU cores.
  • Keywords
    cache storage; coprocessors; matrix multiplication; microprocessor chips; parallel memories; shared memory systems; MVM algorithm performance; data execution; data prefetching; loosely coupled CPU core affinity; matrix vector multiplication; multichip multicore multiprocessor; parallel implementation; private cache; sequential implementation; shared cache storage; tightly coupled CPU core affinity; Artificial neural networks; Cache memory; Instruction sets; Multicore processing; Testing; Vectors; Gustafson´s law; high performance computing; shared memory multiprocessor;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information & Communication Technology Electronics & Microelectronics (MIPRO), 2013 36th International Convention on
  • Conference_Location
    Opatija
  • Print_ISBN
    978-953-233-076-2
  • Type

    conf

  • Filename
    6596257