Title :
Optimizing SIMD Parallel Computation with Non-Consecutive Array Access in Inline SSE Assembly Language
Author :
Juan, Chen ; Canqun, Yang
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Many processors, such as Intel Xeon processor 5100 series, AMD Athlon 64, support SIMD computation model with the Streaming SIMD Extensions (SSE), SSE2 and SSE3. Using double-precision SSE/SSE2/SSE3 instructions simultaneously can handle two packed double-precision floating-point data elements with 128-bit XMM vector registers, which greatly improves floating-point performance. Sometimes non-consecutive data instead of consecutive ones appear in SIMD computation, which prevents SIMD optimization. That is because two non-consecutive double precision floating-point data elements cannot be loaded into 128-bit vector registers simultaneously and they have to be loaded for twice. How to implement SIMD optimization for non-consecutive data is our concern. Loop unrolling exposes the rule and characteristics of such non-consecutive data. Register rotation can help transform non-consecutive data to vector data. Based on a representative kernel program, we illustrate our SIMD optimization combining loop unrolling with register rotation. Through vectorizing non-consecutive data, the performance of "KERNEL" code is improved by 42.4% and PQMRCGSTAB application is improved by 15.3%.
Keywords :
floating point arithmetic; microprocessor chips; parallel processing; program compilers; AMD Athlon 64; Intel Xeon processor 5100 series; SIMD computation model; SIMD optimization; SIMD parallel computation optimisation; SSE; XMM vector registers; floating-point data; inline SSE assembly language; nonconsecutive array access; streaming SIMD extensions; Arrays; Assembly; Kernel; Optimization; Program processors; Registers; Vectors; SIMD; SSE/SSE2/SSE3; inline assembly; loop unrolling; nonconsecutive data; register rotation;
Conference_Titel :
Intelligent Computation Technology and Automation (ICICTA), 2012 Fifth International Conference on
Conference_Location :
Zhangjiajie, Hunan
Print_ISBN :
978-1-4673-0470-2
DOI :
10.1109/ICICTA.2012.70