• DocumentCode
    2379692
  • Title

    A simple latency tolerant processor

  • Author

    Nekkalapu, S. ; Akkary, H. ; Jothi, Komal ; Retnamma, Renjith ; Song, Xiaoyu

  • Author_Institution
    Electr. & Comput. Eng., American Univ. of Beirut, Beirut
  • fYear
    2008
  • fDate
    12-15 Oct. 2008
  • Firstpage
    384
  • Lastpage
    389
  • Abstract
    The advent of multi-core processors and the emergence of new parallel applications that take advantage of such processors pose difficult challenges to designers. With relatively constant die sizes, limited on chip cache, and scarce pin bandwidth, more cores on chip reduces the amount of available cache and bus bandwidth per core, therefore exacerbating the memory wall problem. How can a designer build a processor that provides a core with good single-thread performance in the presence of long latency cache misses, while enabling as many of these cores to be placed on the same die for high throughput. Conventional latency tolerant architectures that use out-of-order superscalar execution have become too complex and power hungry for the multi-core era. Instead, we present a simple, non-blocking architecture that achieves memory latency tolerance without requiring complex out-of-order execution hardware or large, cycle-critical and power hungry structures, such as dynamic schedulers, fully associative load and store queues, and reorder buffers. The non-blocking property of this architecture provides tolerance to hundreds of cycles of cache miss latency on a simple in-order issue core, thus allowing many more such cores to be integrated on the same die than is possible with conventional out-of-order superscalar architecture.
  • Keywords
    cache storage; logic design; microprocessor chips; multiprocessing systems; bus bandwidth; chip cache miss latency; die size; memory latency tolerant multicore processor design; memory wall problem; out-of-order superscalar architecture execution; parallel application; scarce pin bandwidth; single-thread performance; Bandwidth; Buffer storage; Delay; Dynamic scheduling; Hardware; Memory architecture; Multicore processing; Out of order; Process design; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Design, 2008. ICCD 2008. IEEE International Conference on
  • Conference_Location
    Lake Tahoe, CA
  • ISSN
    1063-6404
  • Print_ISBN
    978-1-4244-2657-7
  • Electronic_ISBN
    1063-6404
  • Type

    conf

  • DOI
    10.1109/ICCD.2008.4751889
  • Filename
    4751889