DocumentCode :
2720935
Title :
Efficient complex matrix multiplication on the Synergistic Processing Element of the Cell processor
Author :
Bourgerie, Quentin ; Fortin, Pierre ; Lamotte, Jean-Luc
Author_Institution :
Univ. Pierre et Marie Curie, Paris, France
fYear :
2010
fDate :
20-24 Sept. 2010
Firstpage :
1
Lastpage :
8
Abstract :
In order to implement a complete Fast Multipole Method on the Cell processor, we need an efficient complex matrix multiplication on each Synergistic Processing Element (SPE) of the Cell processor. Since the last IBM SDK does not provide such routine, we build our own one in single precision with C programming. We show that the complex matrix multiplication requires a specific computation scheme for the micro-kernel running on the SPE, and that a 32×32 tile is appropriate for close to peak performance computation as well as for communication overlapping. Our micro-kernel delivers 23.74 Gflop/s, which is 92.7% of the SPE peak performance, and we obtain up to 23.65 Gflop/s for one complete complex matrix product on one SPE, and up to 378.36 Gflop/s for 16 products on 16 SPEs.
Keywords :
C language; matrix multiplication; multiprocessing systems; C programming; Cell processor; complex matrix multiplication; fast multipole method; microkernel; synergistic processing element; Blades; Computer architecture; Laplace equations; Microprocessors; Pipelines; Tiles; CGEMM; Cell processor; Fast Multipole Method; SPE; complex matrix multiplication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on
Conference_Location :
Heraklion, Crete
Print_ISBN :
978-1-4244-8395-2
Electronic_ISBN :
978-1-4244-8397-6
Type :
conf
DOI :
10.1109/CLUSTERWKSP.2010.5613077
Filename :
5613077
Link To Document :
بازگشت