DocumentCode :
639332
Title :
Can lock-free and combining techniques co-exist?: a novel approach on concurrent queue
Author :
Govindaraju, Venkatraman ; Nowatzki, Tony ; Sankaralingam, Karthikeyan
Author_Institution :
Dept. of Comput. Sci., Univ. of Wisconsin - Madison, Madison, WI, USA
fYear :
2013
fDate :
7-11 Sept. 2013
Firstpage :
403
Lastpage :
404
Abstract :
Modern microprocessors exploit data level parallelism through in-core data-parallel accelerators in the form of short vector ISA extentions such as SSE/AVX and NEON. Although these ISA extentions have existed for decades, compilers do not generate good quality, high-performance vectorized code without significant programmer intervention and manual optimization. The fundamental problem is that the architecture is too rigid, which overly complicates the compiler´s role and simultaneously restricts the types of codes that the compiler can profitably map to these data-parallel accelerators. We take a fundamentally new approach that first makes the architecture more flexible and exposes this flexibility to the compiler. Counter-intuitively, increasing the complexity of the accelerator´s interface to the compiler enables a more robust and efficient system that supports many types of codes. This system also enables the performance of auto-acceleration to be comparable to that of manually-optimized implementations. To address the challenges of compiling for flexible accelerators, we propose a variant of Program Dependence Graph called the Access Execute Program Dependence Graph to capture spatio-temporal aspects of memory accesses and computations. We implement a compiler that uses this representation and evaluate it by considering both a suite of kernels developed and tuned for SSE, and “challenge” data-parallel applications, the Parboil benchmarks. We show that our compiler, which targets the DySER accelerator, provides high-quality code for the kernels and full applications, commonly reaching within 30% of manually-optimized and out-performs compiler-produced SSE code by 1.8×.
Keywords :
graph theory; microprocessor chips; optimisation; parallel processing; program compilers; DySER accelerator; PDG; SIMD shackles; access execute program dependence graph; compilers; data level parallelism; data-parallel accelerators; flexible microarchitecture; microprocessors; optimization; spatio-temporal aspects; vector ISA extentions; Acceleration; Computer architecture; Hardware; Optimization; Ports (Computers); Program processors; Vectors; combining; compare-and-swap; concurrent queue; lock-free; swap;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on
Conference_Location :
Edinburgh
ISSN :
1089-795X
Print_ISBN :
978-1-4799-1018-2
Type :
conf
DOI :
10.1109/PACT.2013.6618830
Filename :
6618830
Link To Document :
بازگشت