A simple latency tolerant processor

Author

Nekkalapu, S. ; Akkary, H. ; Jothi, Komal ; Retnamma, Renjith ; Song, Xiaoyu

Author_Institution

Electr. & Comput. Eng., American Univ. of Beirut, Beirut

fYear

2008

fDate

12-15 Oct. 2008

Firstpage

384

Lastpage

389

Abstract

The advent of multi-core processors and the emergence of new parallel applications that take advantage of such processors pose difficult challenges to designers. With relatively constant die sizes, limited on chip cache, and scarce pin bandwidth, more cores on chip reduces the amount of available cache and bus bandwidth per core, therefore exacerbating the memory wall problem. How can a designer build a processor that provides a core with good single-thread performance in the presence of long latency cache misses, while enabling as many of these cores to be placed on the same die for high throughput. Conventional latency tolerant architectures that use out-of-order superscalar execution have become too complex and power hungry for the multi-core era. Instead, we present a simple, non-blocking architecture that achieves memory latency tolerance without requiring complex out-of-order execution hardware or large, cycle-critical and power hungry structures, such as dynamic schedulers, fully associative load and store queues, and reorder buffers. The non-blocking property of this architecture provides tolerance to hundreds of cycles of cache miss latency on a simple in-order issue core, thus allowing many more such cores to be integrated on the same die than is possible with conventional out-of-order superscalar architecture.

Keywords

cache storage; logic design; microprocessor chips; multiprocessing systems; bus bandwidth; chip cache miss latency; die size; memory latency tolerant multicore processor design; memory wall problem; out-of-order superscalar architecture execution; parallel application; scarce pin bandwidth; single-thread performance; Bandwidth; Buffer storage; Delay; Dynamic scheduling; Hardware; Memory architecture; Multicore processing; Out of order; Process design; Throughput;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Design, 2008. ICCD 2008. IEEE International Conference on

Conference_Location

Lake Tahoe, CA

ISSN

1063-6404

Print_ISBN

978-1-4244-2657-7

Electronic_ISBN

1063-6404

Type

conf

DOI

10.1109/ICCD.2008.4751889

Filename

4751889