DocumentCode :
656183
Title :
Java with Auto-parallelization on Graphics Coprocessing Architecture
Author :
Guodong Han ; Chenggang Zhang ; King Tin Lam ; Cho-Li Wang
Author_Institution :
Dept. of Comput. Sci., Univ. of Hong Kong, Hong Kong, China
fYear :
2013
fDate :
1-4 Oct. 2013
Firstpage :
504
Lastpage :
509
Abstract :
GPU-based many-core accelerators have gained a footing in supercomputing. Their widespread adoption yet hinges on better parallelization and load scheduling techniques to utilize the hybrid system of CPU and GPU cores easily and efficiently. This paper introduces a new user-friendly compiler framework and runtime system, dubbed Japonica, to help Java applications harness the full power of a heterogeneous system. Japonica unveils an all-round system design unifying the programming style and language for transparent use of both CPU and GPU resources, automatically parallelizing all kinds of loops and scheduling workloads efficiently across the CPU-GPU border. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. Implementing a GPU-tailored thread-level speculation (TLS) model, Japonica supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU. Experimental results show that Japonica, on average, can run 10x, 2.5x and 2.14x faster than the best serial (1-thread CPU), GPU-alone and CPU-alone versions respectively.
Keywords :
Java; graphics processing units; multiprocessing systems; parallel architectures; program compilers; source code (software); CPU cores; CUDA kernels; GPU cores; GPU-based many-core accelerators; GPU-tailored thread-level speculation; Japonica; Java applications; graphics coprocessing architecture; load scheduling technique; loops execution; multiple Java threads; runtime system; sequential Java source code; supercomputing; task sharing algorithms; user-friendly compiler framework; Central Processing Unit; Graphics processing units; Instruction sets; Java; Kernel; Parallel processing; Programming; GPGPU; Multi-Cores; Parallelization; Profiling; Scheduling; Speculation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2013 42nd International Conference on
Conference_Location :
Lyon
ISSN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2013.62
Filename :
6687386
Link To Document :
بازگشت