DocumentCode
3706571
Title
Generating Efficient Tensor Contractions for GPUs
Author
Thomas Nelson;Axel Rivera;Prasanna Balaprakash;Mary Hall;Paul D. Hovland;Elizabeth Jessup;Boyana Norris
Author_Institution
Dept. of Comput. Sci., Univ. of Colorado, Boulder, CO, USA
fYear
2015
Firstpage
969
Lastpage
978
Abstract
Many scientific and numerical applications, including quantum chemistry modeling and fluid dynamics simulation, require tensor product and tensor contraction evaluation. Tensor computations are characterized by arrays with numerous dimensions, inherent parallelism, moderate data reuse and many degrees of freedom in the order in which to perform the computation. The best-performing implementation is heavily dependent on the tensor dimensionality and the target architecture. In this paper, we map tensor computations to GPUs, starting with a high-level tensor input language and producing efficient CUDA code as output. Our approach is to combine tensor-specific mathematical transformations with a GPU decision algorithm, machine learning and auto tuning of a large parameter space. Generated code shows significant performance gains over sequential and Open MP parallel code, and a comparison with Open ACC shows the importance of auto tuning and other optimizations in our framework for achieving efficient results.
Keywords
"Tensile stress","Graphics processing units","Computer architecture","DSL","Optimization","Indexes","Parallel processing"
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2015 44th International Conference on
ISSN
0190-3918
Type
conf
DOI
10.1109/ICPP.2015.106
Filename
7349652
Link To Document