• DocumentCode
    70518
  • Title

    FP-NUCA: A Fast NOC Layer for Implementing Large NUCA Caches

  • Author

    Arora, Anuj ; Harne, Mayur ; Sultan, Hameedah ; Bagaria, Akriti ; Sarangi, Smruti R.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Delhi, New Delhi, India
  • Volume
    26
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 1 2015
  • Firstpage
    2465
  • Lastpage
    2478
  • Abstract
    NUCA caches have traditionally been proposed as a solution for mitigating wire delays, and delays introduced due to complex networks on chip. Traditional approaches have reported significant performance gains with intelligent block placement, location, replication, and migration schemes. In this paper, we propose a novel approach in this space, called FP-NUCA. It differs from conventional approaches, and relies on a novel method of co-designing the last level cache and the network on chip. We artificially constrain the communication pattern in the NUCA cache such that all the messages travel along a few predefined paths (fast paths) for each set of banks. We leverage this communication pattern by designing a new type of NOC router called the Freeze router, which augments a regular router by adding a layer of circuitry that gates the clock of the regular router when there is a fast path message waiting to be transmitted. Messages along the fast path do not require buffering, switching, or routing. We incorporate a bank predictor with our novel NOC for reducing the number of messages, and resultant energy consumption. We compare our performance with state of the art protocols, and report speedups of up to 31 percent (mean: 6.3 percent), and ED2 reduction up to 46 percent (mean: 10.4 percent) for a suite of Splash and Parsec benchmarks. We implement the Freeze router in VHDL and show that the additional fast path logic has minimal area and timing overheads.
  • Keywords
    cache storage; integrated circuit design; network routing; network-on-chip; FP-NUCA; NOC router; Parsec benchmark; Splash benchmark; VHDL; bank predictor; circuitry layer; communication pattern; complex networks-on-chip; fast NOC layer; fast path logic; freeze router; intelligent block location scheme; intelligent block migration scheme; intelligent block placement scheme; intelligent block replication scheme; large NUCA caches; last level cache; nonuniform cache architectures; performance gains; resultant energy consumption; timing overheads; wire delay mitigation; Benchmark testing; Delays; Distributed databases; Multiplexing; Ports (Computers); Program processors; Proposals; NUCA caches; bank prediction; freeze router;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2014.2358231
  • Filename
    6898874