• DocumentCode
    170685
  • Title

    A low-cost memory interface for high-throughput accelerators

  • Author

    Jing Huang ; Yuanjie Huang ; Temam, Olivier ; Ienne, Paolo ; Yunji Chen ; Chengyong Wu

  • Author_Institution
    State Key Lab. of Comput. Archit., Inst. of Comput. Technol., Beijing, China
  • fYear
    2014
  • fDate
    12-17 Oct. 2014
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    Heterogeneous multi-cores, a mix of cores and accelerators, are becoming prevalent. These accelerators are designed for both speed and energy improvements, and thus, they increasingly come with a large number of load/store ports for achieving a high degree of parallelism. However, beyond GPG-PUs, accelerators such as ASICs and CGRAs are increasingly capable of accelerating computations with irregular control flow and memory accesses; as a result, such accelerators need to be plugged to caches instead of scratchpads, and few studies focus on accelerator-to-cache interfaces. The main existing alternative are Load/Store Queues (LSQs) traditionally used to connect superscalar processors to caches and memory, but in the context of accelerators, they are overkill and could significantly reduce the area and power benefits of accelerators. Moreover, we show that they are just not fit for accelerators plugged to multi-banked caches. In this article, we propose a fast accelerator-to-cache interface with a moderate area and power footprint compared to LSQs, even for a large number of load/store ports. For that purpose, we introduce a set of low-overhead techniques for ensuring in-order delivery of requests to/from cache banks. We synthesize and layout at 65nm the design of both our interface and an LSQ specially adapted to accelerators for a fair comparison. We find that our interface achieves on average 78% of the performance of an LSQ using only 16% of the area and 24% of the power.
  • Keywords
    application specific integrated circuits; cache storage; graphics processing units; integrated circuit design; multiprocessing systems; ASIC; CGRA; GPG-PU; LSQ; accelerator-to-cache interfaces; cache banks; control flow; heterogeneous multicores; high-throughput accelerators; load-store queues; memory accesses; memory interface; multibanked caches; superscalar processors; Acceleration; Application specific integrated circuits; Computer architecture; Layout; Out of order; Ports (Computers);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2014 International Conference on
  • Conference_Location
    Jaypee Greens
  • Type

    conf

  • DOI
    10.1145/2656106.2656109
  • Filename
    6972464