مرکز منطقه ای اطلاع رساني علوم و فناوري - GPU Accelerating for Rapid Multi-core Cache Simulation

DocumentCode :

3145574

Title :

GPU Accelerating for Rapid Multi-core Cache Simulation

Author :

Han, Wan ; Xiang, Long ; Xiaopeng, Gao ; Yi, Li

Author_Institution :

State Key Lab. of Virtual Reality Technol. & Syst., Beihang Univ., Beijing, China

fYear :

2011

fDate :

16-20 May 2011

Firstpage :

1387

Lastpage :

1396

Abstract :

To find the best memory system for emerging workloads, traces are obtained during application´s execution, then caches with different configurations are simulated using these traces. Since program traces can be several gigabytes, simulation of cache performance is a time consuming process. Compute unified device architecture (CUDA) is a software development platform which enables programmers to accelerate the general-purpose applications on the graphics processing unit (GPU). This paper presents a real time multi-core cache simulator, which was built based on the Pin tool to get the memory reference, and fast method for multi-core cache simulation using the CUDA-enabled GPU. The proposed method is accelerated by the following techniques: execution parallelism exploration, memory latency hiding, a novel trace compression methodology. We describe how these techniques can be incorporated into CUDA code. Experimental results show that the hybrid parallel method of time-partitioning combines with set-partitioning presented here is 11.10× speedup compared to the CPU serial simulation algorithm. The present simulator can characterize cache performance of single-threaded or multi-threaded workloads at the speeds of 6-15 MIPS. It can simulates 6 cache configurations within one single pass at this speeds compared to CMP$im, which can only simulate one cache configuration each simulation pass at the speeds of 4-10 MIPS.

Keywords :

cache storage; computer architecture; computer graphic equipment; coprocessors; multiprocessing systems; parallel processing; CUDA; GPU; Pin tool; compute unified device architecture; execution parallelism exploration; graphics processing unit; hybrid parallel method; memory latency hiding; memory system; multicore cache simulation; program traces; set-partitioning; time-partitioning; trace compression methodology; Computational modeling; Graphics processing unit; Instruction sets; Instruments; Parallel processing; Partitioning algorithms;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on

Conference_Location :

Shanghai

ISSN :

1530-2075

Print_ISBN :

978-1-61284-425-1

Electronic_ISBN :

1530-2075

Type :

conf

DOI :

10.1109/IPDPS.2011.295

Filename :

6008993

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3145574