Title :
Reducing GPU offload latency via fine-grained CPU-GPU synchronization
Author :
Lustig, Daniel ; Martonosi, Margaret
Abstract :
GPUs are seeing increasingly widespread use for general purpose computation due to their excellent performance for highly-parallel, throughput-oriented applications. For many workloads, however, the performance benefits of offloading are hindered by the large and unpredictable overheads of launching GPU kernels and of transferring data between CPU and GPU.
Keywords :
application program interfaces; graphics processing units; API; data latency predictability; data transfer; early kernel launch; fine-grained CPU-GPU synchronization; hardware support; offload latency reduction; overheads reduction; proactive data returns; program execution; real-system measurements; software support; throughput-oriented applications; Central Processing Unit; Graphics processing units; Hardware; Instruction sets; Kernel; Random access memory; Synchronization;
Conference_Titel :
High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4673-5585-8
DOI :
10.1109/HPCA.2013.6522332