• DocumentCode
    1827136
  • Title

    Fast Linear Algebra on GPU

  • Author

    Polok, Lukas ; Smrz, Pavel

  • Author_Institution
    IT4Innovations Centre of Excellence, Brno Univ. of Technol., Brno, Czech Republic
  • fYear
    2012
  • fDate
    25-27 June 2012
  • Firstpage
    439
  • Lastpage
    444
  • Abstract
    GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is a minimal size of primitives being handled in order to achieve significant speedups compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching operations to have sufficient amount of data to perform calculations maximally efficiently on the GPU. A fast OpenCL implementation of two basic vector functions-vector reduction and vector scaling-is described in this paper. Its performance is analyzed by running benchmarks on two of the most common GPUs in use-Tesla and Fermi NVIDIA GPUs. Reported experimental results show that our implementation significantly outperforms the current state-of-the-art GPUbased basic linear algebra library CUBLAS.
  • Keywords
    graphics processing units; linear algebra; parallel architectures; parallel languages; CUBLAS; GPU; OpenCL implementation; basic vector function; batching operation; linear algebra library; mathematical function; vector reduction; vector scaling; Benchmark testing; Graphics processing unit; Instruction sets; Kernel; Libraries; Memory management; Vectors; BLAS; CUDA; GPU; OpenCL; linear algebra; parallel reduction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on
  • Conference_Location
    Liverpool
  • Print_ISBN
    978-1-4673-2164-8
  • Type

    conf

  • DOI
    10.1109/HPCC.2012.66
  • Filename
    6332205