DocumentCode :
3145574
Title :
GPU Accelerating for Rapid Multi-core Cache Simulation
Author :
Han, Wan ; Xiang, Long ; Xiaopeng, Gao ; Yi, Li
Author_Institution :
State Key Lab. of Virtual Reality Technol. & Syst., Beihang Univ., Beijing, China
fYear :
2011
fDate :
16-20 May 2011
Firstpage :
1387
Lastpage :
1396
Abstract :
To find the best memory system for emerging workloads, traces are obtained during application´s execution, then caches with different configurations are simulated using these traces. Since program traces can be several gigabytes, simulation of cache performance is a time consuming process. Compute unified device architecture (CUDA) is a software development platform which enables programmers to accelerate the general-purpose applications on the graphics processing unit (GPU). This paper presents a real time multi-core cache simulator, which was built based on the Pin tool to get the memory reference, and fast method for multi-core cache simulation using the CUDA-enabled GPU. The proposed method is accelerated by the following techniques: execution parallelism exploration, memory latency hiding, a novel trace compression methodology. We describe how these techniques can be incorporated into CUDA code. Experimental results show that the hybrid parallel method of time-partitioning combines with set-partitioning presented here is 11.10× speedup compared to the CPU serial simulation algorithm. The present simulator can characterize cache performance of single-threaded or multi-threaded workloads at the speeds of 6-15 MIPS. It can simulates 6 cache configurations within one single pass at this speeds compared to CMP$im, which can only simulate one cache configuration each simulation pass at the speeds of 4-10 MIPS.
Keywords :
cache storage; computer architecture; computer graphic equipment; coprocessors; multiprocessing systems; parallel processing; CUDA; GPU; Pin tool; compute unified device architecture; execution parallelism exploration; graphics processing unit; hybrid parallel method; memory latency hiding; memory system; multicore cache simulation; program traces; set-partitioning; time-partitioning; trace compression methodology; Computational modeling; Graphics processing unit; Instruction sets; Instruments; Parallel processing; Partitioning algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2011.295
Filename :
6008993
Link To Document :
بازگشت