DocumentCode :
1799866
Title :
Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
Author :
Jevdjic, Djordje ; Loh, Gabriel H. ; Kaynak, Cansu ; Falsafi, Babak
fYear :
2014
fDate :
13-17 Dec. 2014
Firstpage :
25
Lastpage :
37
Abstract :
Recent research advocates large die-stacked DRAM caches in many core servers to break the memory latency and bandwidth wall. To realize their full potential, die-stacked DRAM caches necessitate low lookup latencies, high hit rates and the efficient use of off-chip bandwidth. Today´s stacked DRAM cache designs fall into two categories based on the granularity at which they manage data: block-based and page-based. The state-of-the-art block-based design, called Alloy Cache, collocates a tag with each data block (e.g., 64B) in the stacked DRAM to provide fast access to data in a single DRAM access. However, such a design suffers from low hit rates due to poor temporal locality in the DRAM cache. In contrast, the state-of-the-art page-based design, called Footprint Cache, organizes the DRAM cache at page granularity (e.g., 4KB), but fetches only the blocks that will likely be touched within a page. In doing so, the Footprint Cache achieves high hit rates with moderate on-chip tag storage and reasonable lookup latency. However, multi-gigabyte stacked DRAM caches will soon be practical and needed by server applications, thereby mandating tens of MBs of tag storage even for page-based DRAM caches. We introduce a novel stacked-DRAM cache design, Unison Cache. Similar to Alloy Cache´s approach, Unison Cache incorporates the tag metadata directly into the stacked DRAM to enable scalability to arbitrary stacked-DRAM capacities. Then, leveraging the insights from the Footprint Cache design, Unison Cache employs large, page-sized cache allocation units to achieve high hit rates and reduction in tag overheads, while predicting and fetching only the useful blocks within each page to minimize the off-chip traffic. Our evaluation using server workloads and caches of up to 8GB reveals that Unison cache improves performance by 14% compared to Alloy Cache due to its high hit rate, while outperforming the state-of-the art page-based designs that require impractical SRAM-based tags of- around 50MB.
Keywords :
DRAM chips; SRAM chips; cache storage; paged storage; Alloy Cache approach; DRAM access; SRAM-based tags; Unison cache; bandwidth wall; block-based data management; die-stacked DRAM cache; footprint cache design; lookup latencies; manycore servers; memory latency; multigigabyte stacked DRAM caches; off-chip bandwidth; off-chip traffic; on-chip tag storage; page granularity; page-based DRAM caches; page-based data management; page-sized cache allocation units; server workloads; stacked-DRAM capacities; Bandwidth; Metals; Organizations; Random access memory; Resource management; Servers; System-on-chip; 3D die stacking; DRAM; caches; memory; servers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
Conference_Location :
Cambridge
ISSN :
1072-4451
Type :
conf
DOI :
10.1109/MICRO.2014.51
Filename :
7011375
Link To Document :
بازگشت