Author_Institution :
NVIDIA Corp., Santa Clara, CA, USA
Abstract :
Throughput-optimized processors, such as graphics processing units (GPUs) have scaled at historic rates in recent years, and continue to do so along a design trajectory that is largely unhindered by conventional dogmas and legacies. These processors recognize that two critical aspects of machine organization are key to performance: parallel execution and hierarchical memory organization. Conventional processors, which present an illusion of sequential execution and uniform, flat memory, find their performance increasing only slowly over time, and their evolution is at an end. In contrast, throughput processors embrace, rather than deny, parallelism and memory hierarchy to realize large performance and efficiency advantages. Throughput processors have hundreds of cores today and will have thousands of cores by 2015. They will deliver most of the performance, and most of the user value, in future computer systems. This talk will discuss some of the challenges and opportunities in the architecture and programming of future throughput processors, as it relates to the EDA world. First, CAD tools, flows, and methodologies are clearly crucial to the design of these processors, and these must adapt to support such designs. Second, in this changing landscape, CAD tools must need to evolve to run on throughput processors. In throughput processors, performance derives from the parallelism available from the plentiful arithmetic units, and efficiency derives from locality, overcoming restrictions stemming from communication bandwidth bottlenecks that dominate cost, performance, and power. This talk will discuss exploitation of parallelism and locality with examples drawn from the Imagine and Merrimac projects, from NVIDIA GPUs, and from three generations of stream programming systems.