• DocumentCode
    692871
  • Title

    AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs

  • Author

    Qian Wang ; Xianyi Zhang ; Yunquan Zhang ; Qing Yi

  • Author_Institution
    Inst. of Software, Beijing, China
  • fYear
    2013
  • fDate
    17-22 Nov. 2013
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors.
  • Keywords
    linear algebra; multiprocessing systems; optimising compilers; parallel processing; program assemblers; AMD ACML BLAS libraries; AMD Piledriver processors; AUGEM; AXPY; BLAS; DLA kernels; DOT; GEMM; GEMV; Intel MKL; Intel Sandy Bridge; SSE-AVX instructions; basic liner algebra subprograms; domain-specific knowledge; fully optimized assembly code generation; high performance dense linear algebra kernel automatic generation; instruction sequences; multicore CPUs; parameterized code template collection; scientific computing; specialized low-level C code optimizer; template-based optimization framework; x86 CPUs; Abstracts; Arrays; Generators; Kernel; Registers; Resource management; Seals; DLA code optimization; auto-tuning; code generation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
  • Conference_Location
    Denver, CO
  • Print_ISBN
    978-1-4503-2378-9
  • Type

    conf

  • DOI
    10.1145/2503210.2503219
  • Filename
    6877458