DocumentCode :
1999005
Title :
Toward Automatic Optimized Code Generation for Multiprecision Modular Exponentiation on a GPU
Author :
Emmart, Niall ; Weems, Charles
Author_Institution :
Comput. Sci. Dept., Univ. of Massachusetts, Amherst, MA, USA
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
1700
Lastpage :
1707
Abstract :
Multiprocessing modular exponentiation has a variety of uses, including cryptography, prime testing and computational number theory. It is also a very costly operation to compute. GPU parallelism can be used to accelerate these computations, but to use the GPU efficiently, a problem must involve a significant number of simultaneous exponentiation operations. Handling a large number of TLS/SSL encrypted sessions in a data center is a significant problem that fits this profile. We have developed a framework that enables generation of highly efficient NVIDIA PTX implementations of exponentiation operations for different GPU architectures and problem instances. One of the challenges in generating such code is that PTX is not a true assembly language, but is instead a virtual instruction set that is compiled and optimized in different ways for different generations of GPU hardware. Thus, the same PTX code runs with different levels of efficiency on different machines. And as the precision of the exponentiation values changes, each architecture has its own break-even points where a different algorithm or parallelization strategy must be employed. To make the code efficient for a given problem instance and architecture thus requires searching a multidimensional space of algorithms and configurations, by generating thousands of lines of carefully constructed PTX code for each combination, executing it, validating the numerical result, and evaluating its actual performance. Our framework automates much of this process, and produces exponentiation code that is up to six times faster than the best known hand-coded implementations. More importantly, the framework enables users to relatively quickly find the best configuration for each new GPU architecture. Our framework is also the basis for the eventual creation of a multiprocessing matrix arithmetic package for GPU cluster systems that will be portable across multiple generations of GPU.
Keywords :
graphics processing units; multiprocessing systems; parallel processing; program compilers; GPU cluster systems; GPU parallelism; PTX code; automatic optimized code generation; break-even points; multiprecision multiprocessing modular exponentiation; multiprocessing matrix arithmetic package; virtual instruction set; Assembly; Computer architecture; Educational institutions; Generators; Graphics processing units; Libraries; Registers; GPU; PTX code generation; RSA; modular exponentiation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-4979-8
Type :
conf
DOI :
10.1109/IPDPSW.2013.149
Filename :
6651068
Link To Document :
بازگشت