مرکز منطقه ای اطلاع رساني علوم و فناوري - Can lock-free and combining techniques co-exist?: a novel approach on concurrent queue

DocumentCode :

639332

Title :

Can lock-free and combining techniques co-exist?: a novel approach on concurrent queue

Author :

Govindaraju, Venkatraman ; Nowatzki, Tony ; Sankaralingam, Karthikeyan

Author_Institution :

Dept. of Comput. Sci., Univ. of Wisconsin - Madison, Madison, WI, USA

fYear :

2013

fDate :

7-11 Sept. 2013

Firstpage :

403

Lastpage :

404

Abstract :

Modern microprocessors exploit data level parallelism through in-core data-parallel accelerators in the form of short vector ISA extentions such as SSE/AVX and NEON. Although these ISA extentions have existed for decades, compilers do not generate good quality, high-performance vectorized code without significant programmer intervention and manual optimization. The fundamental problem is that the architecture is too rigid, which overly complicates the compiler´s role and simultaneously restricts the types of codes that the compiler can profitably map to these data-parallel accelerators. We take a fundamentally new approach that first makes the architecture more flexible and exposes this flexibility to the compiler. Counter-intuitively, increasing the complexity of the accelerator´s interface to the compiler enables a more robust and efficient system that supports many types of codes. This system also enables the performance of auto-acceleration to be comparable to that of manually-optimized implementations. To address the challenges of compiling for flexible accelerators, we propose a variant of Program Dependence Graph called the Access Execute Program Dependence Graph to capture spatio-temporal aspects of memory accesses and computations. We implement a compiler that uses this representation and evaluate it by considering both a suite of kernels developed and tuned for SSE, and “challenge” data-parallel applications, the Parboil benchmarks. We show that our compiler, which targets the DySER accelerator, provides high-quality code for the kernels and full applications, commonly reaching within 30% of manually-optimized and out-performs compiler-produced SSE code by 1.8×.

Keywords :

graph theory; microprocessor chips; optimisation; parallel processing; program compilers; DySER accelerator; PDG; SIMD shackles; access execute program dependence graph; compilers; data level parallelism; data-parallel accelerators; flexible microarchitecture; microprocessors; optimization; spatio-temporal aspects; vector ISA extentions; Acceleration; Computer architecture; Hardware; Optimization; Ports (Computers); Program processors; Vectors; combining; compare-and-swap; concurrent queue; lock-free; swap;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on

Conference_Location :

Edinburgh

ISSN :

1089-795X

Print_ISBN :

978-1-4799-1018-2

Type :

conf

DOI :

10.1109/PACT.2013.6618830

Filename :

6618830

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=639332