مرکز منطقه ای اطلاع رساني علوم و فناوري - Performance and area aware replacement policy for GPU architecture

DocumentCode :

1776953

Title :

Performance and area aware replacement policy for GPU architecture

Author :

Abadi, Fatemeh Kazemi Hassan ; Safari, Saeed

Author_Institution :

Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran

fYear :

2014

fDate :

29-30 Oct. 2014

Firstpage :

497

Lastpage :

503

Abstract :

Recent studies have shown that cache partitioning is an efficient technique to improve throughput in multi-core processors. The existing cache partitioning algorithms assume Least Recently Used (LRU) as underlying replacement policy. We propose old Tree-based PLRU on two-level caches with higher speed up or performance matching of LRU at GPUs. The algorithm is based on Pseudo LRU that uses binary tree to reduce area overhead. Also, it uses set-dueling to dynamically adapt its insertion and promotion. We evaluate effect of this policy on both L1 and L2 caches in GPUs. We propose a high accuracy profiling logic and a cache partitioning hardware for our scheme. We evaluate the hardware costs in terms of performance, miss rates, DRAM locality, area, energy, and compare them with LRU and FIFO partitioning algorithms. We define a set of machine models to discuss our scheme on some general purpose workloads. The results show that our solutions impose negligible performance degradation comparing LRU. Then, we use insertion and promotion vectors to compensate for drop of performance. On compute workloads, the technique reduces L2 miss rate about 10.11%.

Keywords :

cache storage; graphics processing units; multiprocessing systems; performance evaluation; DRAM locality; FIFO partitioning algorithms; GPU architecture; L2 miss rate; area aware replacement policy; binary tree; cache partitioning algorithms; general purpose workloads; insertion vectors; least recently used; miss rates; multicore processors; performance aware replacement policy; performance degradation; performance matching; promotion vectors; pseudo LRU; set-dueling; tree-based PLRU; two-level caches; Benchmark testing; Computational modeling; Graphics processing units; Measurement; Memory management; Random access memory; Vectors; GPU; Insertion and promotion vector (IPV); Tree-based PLRU;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on

Conference_Location :

Mashhad

Print_ISBN :

978-1-4799-5486-5

Type :

conf

DOI :

10.1109/ICCKE.2014.6993378

Filename :

6993378

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1776953