مرکز منطقه ای اطلاع رساني علوم و فناوري - Stream architectures

DocumentCode :

2865835

Title :

Stream architectures - efficiency and programmability

Author :

Erez, Mattan

Author_Institution :

Stanford Univ., CA, USA

fYear :

2004

fDate :

16-18 Nov. 2004

Firstpage :

Abstract :

Summary form only given. Stream processors are fully programmable in a high-level language, yet are capable of achieving computation efficiency comparable to fixed-function ASIC solutions (about 20 pJ/op) and can be scaled from a Gop/s (20 mW) block to a Top/s (20 W) chip in current semiconductor technology. The parallel nature of stream processors enables their performance to scale with technology. In a 2010 45 nm technology we expect an efficiency of 1 pJ/op and performance of up to 20 Top/s (20 W). A stream processor contains an array of arithmetic units that are supplied with data by a deep and explicit register hierarchy, which also serves to decouple instruction execution from unpredictable and long-latency memory operations. This decoupled and exposed-communication architecture enables a compiler to automatically map a stream application (such as a signal-flow graph) to the processing array: employing "stream scheduling" to stage the high-level movement of streams, and "communication scheduling" to schedule the data movement in the low-level kernels. This explicit optimization of communication results in almost all data and instruction movement taking place over short wires, and hence almost all energy going to useful computation. We have built a prototype streaming signal processor, Imagine, and have demonstrated streaming applications involving video compression/decompression, wireless communication, and adaptive beam-forming. We are also designing the Merrimac supercomputer, which uses a stream processor based on the same architectural principles as Imagine, illustrating the flexibility, generality, and scalability of the streaming concept. This paper describes stream architectures, stream programming systems, and streaming applications. A comparison is made to conventional DSPs, FPGAs, and ASIC solutions.

Keywords :

PLD programming; array signal processing; data compression; data flow graphs; digital arithmetic; microprocessor chips; microprogramming; mobile radio; multimedia communication; network routing; signal flow graphs; system-on-chip; technological forecasting; video coding; 20 W; 20 mW; 45 nm; ASIC; DSP; FPGA; Imagine prototype streaming signal processor; Merrimac supercomputer; adaptive beam-forming; arithmetic unit array; communication optimization; communication scheduling; compiler stream application mapping; computation efficiency; data movement; decoupled exposed-communication architecture; deep explicit register hierarchy; fixed-function ASIC; high-level language programmable processors; high-level movement; instruction execution; instruction movement; low-level kernels; parallel stream processors; performance scaling; processing array; programmability; semiconductor technology; signal-flow graph; stream architectures; stream processor architectural principles; stream processors; stream programming systems; stream scheduling; streaming applications; technology scaling; unpredictable long-latency memory operations; video compression; video decompression; wireless communication; Application specific integrated circuits; Arithmetic; Computer architecture; High level languages; Kernel; Processor scheduling; Registers; Signal processing; Streaming media; Wires;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

System-on-Chip, 2004. Proceedings. 2004 International Symposium on

Print_ISBN :

0-7803-8558-6

Type :

conf

DOI :

10.1109/ISSOC.2004.1411141

Filename :

1411141

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2865835