مرکز منطقه ای اطلاع رساني علوم و فناوري - Performance Estimation of GPUs with Cache

DocumentCode :

2996282

Title :

Performance Estimation of GPUs with Cache

Author :

Parakh, Arun Kumar ; Balakrishnan, M. ; Paul, Kolin

Author_Institution :

Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Delhi, New Delhi, India

fYear :

2012

fDate :

21-25 May 2012

Firstpage :

2384

Lastpage :

2393

Abstract :

Performance estimation of an application on any processor is becoming a essential task, specially when the processor is used for high performance computing. Our work here presents a model to estimate performance of various applications on a modern GPU. Recently, GPUs are getting popular in the area of high performance computing along with original application domain of graphics. We have chosen FERMI architecture from NVIDIA, as an example of modern GPU. Our work is divided into two basic parts, first we try to estimate computation time and then follow it up with estimation of memory access time. Instructions in the kernel contribute significantly to the computation time. We have developed a model to count the number of instructions in the kernel. We have found our instruction count methodology to give exact count. Memory access time is calculated in three steps, address trace generation, cache simulation and average memory latency per warp. Finally, computation time is combined with memory access time to predict the total execution time. This model has been tested with micro-benchmarks as well as real life kernels like blowfish encryption matrix multiplication and image smoothing. We have found that our average estimation errors for these applications range from -7.76% to 55%.

Keywords :

cache storage; cryptography; graphics processing units; image processing; matrix multiplication; parallel architectures; performance evaluation; smoothing methods; FERMI architecture; GPU; NVIDIA; address trace generation; average memory latency per warp; blowfish encryption matrix multiplication; cache simulation; computation time estimation; high performance computing; image smoothing; instruction count methodology; kernel; memory access time estimation; micro-benchmark; performance estimation; processor; total execution time prediction; Clocks; Computational modeling; Computer architecture; Equations; Estimation; Graphics processing unit; Kernel; FERMI GPU; address trace; blocks; cache; estimation; instruction count; memory; performance; prediction; threads; warp;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International

Conference_Location :

Shanghai

Print_ISBN :

978-1-4673-0974-5

Type :

conf

DOI :

10.1109/IPDPSW.2012.328

Filename :

6270610

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2996282