Author :
Nishikawa, Naoki ; Iwai, Keisuke ; Kurokawa, Takakazu
Author_Institution :
Dept. of Comput. Sci., Nat. Defense Acad. of Japan, Yokosuka, Japan
Abstract :
From almost all research fields, GPU has been noticed as hardware with high cost-performance. Consequently, implementation of cryptographic modules on GPU is also becoming popular. It has become increasingly clear that GPGPU implemented encryption speed is beyond that of FPGA implementation, for which speeding up depends heavily on programmers´tuning techniques. To date, we have also evaluated its effectiveness aimed at AES. In this paper, we targeted AES, Camellia, CIPHER UNICORN-A, and Hierocrypt-3 from an e-Government Recommended Ciphers List by Cryptography Research and Evaluation Committees (CRYPTREC) in Japan as encryption implementation on CUDA, based on two previously reported insights. According to the evaluation result, the throughput of implementation of Camellia on Tesla C2050 achieved 50.6 Gbps. In contrast, throughput without data transfer and with overlapping encryption on GPU and the data copy was 27.5 Gbps. Moreover, using Tesla C2050, small 4 MB file size achieved near maximum throughput, which indicates that Fermi architecture is more suitable for practical use as a cryptographic accelerator than conventional GT200 architecture is. Furthermore, the tendencies of the performance increase for each GPU showed similar results, irrespective of the type of cipher algorithm used. The fact might allow performance prediction modeling to become straightforward.
Keywords :
cryptography; parallel architectures; AES; CIPHER UNICORN-A; CUDA; Camellia; Fermi architecture; GPU; Hierocrypt-3; Tesla C2050; cryptographic accelerator; cryptographic module; encryption speed; high-performance symmetric block cipher; Computer architecture; Encryption; Graphics processing unit; Instruction sets; Registers; Throughput; Acceleration; GPGPU; Symmetric Block Cipher;