DocumentCode :
467549
Title :
Introduction to Programming High Performance Applications on the CELL Broadband Engine
Author :
Kurzak, Jakub ; Buttari, Alfredo
Author_Institution :
Univ. of Tennessee, Knoxville
fYear :
2007
fDate :
22-24 Aug. 2007
Firstpage :
11
Lastpage :
11
Abstract :
Summary form only given. Programming the STI CELL processor is about successfully exploiting its potential for delivering very high performance. The purpose of this tutorial is to give the programmer practical guidelines for achieving this goal. We begin by a brief overview of the main CELL architectural features and its software development environment. Then we discuss three basic aspects of CELL programming: SPE SIMD kernel development (vectorization), SPE parallelization and intra-chip communication. We show how high performance SPE kernels are created by replacing scalar operations with vector ones, heavily unrolling loops, and exploiting dual-issue nature of the SPE architecture. We explain coding using SIMD C language extensions (intrinsics), as well as using assembly language and discuss aspects specific to code development in assembly. We present static performance analysis using the spu-timing tool. The presentation of intra-chip communication follows, with emphasis on DMA communication both for bulk data transfers as well as for synchronization. We discuss message size and alignment restrictions, enforcing of message ordering using barrier and fence mechanisms and creation of complex data transfers using DMA lists. We conclude the topic with guidelines on implementing pipelined processing with direct local store to local store communication. We discuss basic profiling techniques using the SPE decrementer. We conclude with a set of practical tips and tricks and a list of "gotchas" or common rookie mistakes. A brief overview of academic and commercial CELL programming packages follows, and a discussion of a real life example - scanning network traffic using DFA- based string matching. The tutorial ends with a presentation of techniques for programming multi-CELL systems using message passing with MPI.
Keywords :
C language; application program interfaces; assembly language; message passing; multiprocessing systems; parallel processing; pipeline processing; program control structures; programming environments; string matching; CELL architectural features; CELL broadband engine; CELL programming packages; DFA- based string matching; DMA communication; SIMD C language extensions; SPE SIMD kernel development; SPE decrementer; SPE parallelization; STI CELL processor; alignment restrictions; assembly language; bulk data transfers; code development; high performance application programming; intra-chip communication; message passing interface; message size restrictions; network traffic scanning; pipelined processing; profiling techniques; software development environment; spu-timing tool; static performance analysis; unrolling loops; vectorization; Assembly; Computer architecture; Engines; Guidelines; Kernel; Packaging; Parallel programming; Performance analysis; Programming profession; Telecommunication traffic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Performance Interconnects, 2007. HOTI 2007. 15th Annual IEEE Symposium on
Conference_Location :
Stanford, CA
ISSN :
1550-4794
Print_ISBN :
978-0-7695-2979-0
Type :
conf
DOI :
10.1109/HOTI.2007.30
Filename :
4296801
Link To Document :
بازگشت