• DocumentCode
    37337
  • Title

    Cluster-based approach for improving graphics processing unit performance by inter streaming multiprocessors locality

  • Author

    Keshtegar, Mohammad Mahdi ; Falahati, Hajar ; Hessabi, Shaahin

  • Author_Institution
    Comput. Eng. Dept., Sharif Univ. of Technol., Tehran, Iran
  • Volume
    9
  • Issue
    5
  • fYear
    2015
  • fDate
    9 2015
  • Firstpage
    275
  • Lastpage
    282
  • Abstract
    Owing to a new platform for high performance and general-purpose computing, graphics processing unit (GPU) is one of the most promising candidates for faster improvement in peak processing speed, low latency and high performance. As GPUs employ multithreading to hide latency, there is a small private data cache in each single instruction multiple thread (SIMT) core. Hence, these cores communicate in many applications through the global memory. Access to this public memory takes long time and consumes large amount of power. Moreover, the memory bandwidth is limited which is quite challenging in parallel processing. The missed memory requests in last level cache that are followed by accesses to the slow off-chip memory harm power and performance significantly. In this research, the authors introduce a light overhead mechanism to reduce off-chip memory requests which are triggering by miss events in on-chip caches. The authors propose a cluster-based architecture to capture the similarity of memory requests between SIMT cores and provide data for missed requests by adjacent cores. Simulation results reveal that the proposed architecture enhances the geometric mean of instructions per cycle by 6.3% for evaluated benchmarks, whereas the maximum gain is 22%. Furthermore, the geometric mean of total energy consumption overhead is 4.8% for evaluated applications.
  • Keywords
    cache storage; graphics processing units; multi-threading; multiprocessing systems; pattern clustering; power aware computing; GPU; SIMT core; SIMT cores; cluster-based approach; cluster-based architecture; energy consumption overhead; general-purpose computing; global memory; graphics processing unit performance; high performance computing; interstreaming multiprocessor locality; miss events; multithreading; off-chip memory requests; on-chip caches; parallel processing; private data cache; public memory; single instruction multiple thread core;
  • fLanguage
    English
  • Journal_Title
    Computers & Digital Techniques, IET
  • Publisher
    iet
  • ISSN
    1751-8601
  • Type

    jour

  • DOI
    10.1049/iet-cdt.2014.0092
  • Filename
    7182809