DocumentCode :
692871
Title :
AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs
Author :
Qian Wang ; Xianyi Zhang ; Yunquan Zhang ; Qing Yi
Author_Institution :
Inst. of Software, Beijing, China
fYear :
2013
fDate :
17-22 Nov. 2013
Firstpage :
1
Lastpage :
12
Abstract :
Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors.
Keywords :
linear algebra; multiprocessing systems; optimising compilers; parallel processing; program assemblers; AMD ACML BLAS libraries; AMD Piledriver processors; AUGEM; AXPY; BLAS; DLA kernels; DOT; GEMM; GEMV; Intel MKL; Intel Sandy Bridge; SSE-AVX instructions; basic liner algebra subprograms; domain-specific knowledge; fully optimized assembly code generation; high performance dense linear algebra kernel automatic generation; instruction sequences; multicore CPUs; parameterized code template collection; scientific computing; specialized low-level C code optimizer; template-based optimization framework; x86 CPUs; Abstracts; Arrays; Generators; Kernel; Registers; Resource management; Seals; DLA code optimization; auto-tuning; code generation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
Conference_Location :
Denver, CO
Print_ISBN :
978-1-4503-2378-9
Type :
conf
DOI :
10.1145/2503210.2503219
Filename :
6877458
Link To Document :
بازگشت