DocumentCode :
3698308
Title :
Restructuring and implementations of 2D matrix transpose algorithm using SSE4 vector instructions
Author :
Ahmed S. Zekri
Author_Institution :
Department of Mathematics and Computer Science, Faculty of Sciences, Beirut Arab University, Beirut, Lebanon
fYear :
2015
Firstpage :
1
Lastpage :
7
Abstract :
Current general-purpose processors are augmented with vector instructions that can process many elements of matrices and vectors in parallel. Transposing a matrix in-place is a main kernel operation required by many scientific and engineering applications to shuttle data before, during, or after processing. This operation increases the traffic on the memory bus and hence clever techniques such as blocking are required to enhance the performance. In this paper, we present an enhanced version of a previously published algorithm for transposing a matrix on a two-dimensional processor arrays. We restructured this algorithm to fit the one-dimensional vector register architecture augmented to general-purpose CPUs. We implemented the new vector algorithm using Intel SSE4 vector instruction set and compare its performance with the standard sequential algorithm in addition to an already employed implementation of Ekhlundh´s algorithm. We also studied the automatic compiler optimizations and their effect on the vectorization of the algorithm. The best of our implementations showed a maximum speedup of 1.6 compared with the sequential algorithm, and an almost equal performance compared with Eklundh´s algorithm implementation.
Keywords :
"Registers","Program processors","Signal processing algorithms","Parallel processing","Yttrium","Multicore processing","Kernel"
Publisher :
ieee
Conference_Titel :
Applied Research in Computer Science and Engineering (ICAR), 2015 International Conference on
Type :
conf
DOI :
10.1109/ARCSE.2015.7338144
Filename :
7338144
Link To Document :
بازگشت