Title :
Breaking the bandwidth wall in chip multiprocessors
Author :
Vega, Augusto ; Cabarcas, Felipe ; Ramírez, Alex ; Valero, Mateo
Author_Institution :
Barcelona Supercomput. Center, Univ. Politec. de Catalunya, Barcelona, Spain
Abstract :
In throughput-aware CMPs like GPUs and DSPs, software-managed streaming memory systems are an effective way to tolerate high latencies. E.g., the Cell/B.E. incorporates local memories, and data transfers to/from those memories are overlapped with computation using DMAs. In such designs, the latency of the memory system has little impact on performance; instead, memory bandwidth becomes critical. With the increase in the number of cores, conventional DRAMs no longer suffice to satisfy the bandwidth demand. Hence, recent throughput-aware CMPs adopted caches to filter off-chip traffic. However, such caches are optimized for latency, not bandwidth. This work presents a re-design of the memory system in throughput-aware CMPs. Instead of a traditional latency-aware cache, we propose to spread the address space using fine-grained interleaving all over a shared non-coherent last-level cache (LLC). In this way, on-chip storage is optimally used, with no need to keep coherency. On the memory side, we also propose the use of interleaving across DRAMs but with a much finer granularity than usual page-size approaches. Our proposal is highly optimized for bandwidth, not latency, by avoiding data replication in the LLC and by using fine-grained address space interleaving in both the LLC and the memory. For a CMP with 128 cores and 64-MB LLC, performance is improved by 21% due to the LLC optimizations and an extra 42% due to the off-chip memory optimizations, for a total 1.7 times performance improvement.
Keywords :
cache storage; microprocessor chips; multiprocessing systems; DSP; GPU; LLC optimization; bandwidth demand; chip multiprocessors; data replication; digital signal processors; dynamic memory allocation; graphics processing unit; last-level cache; off-chip memory optimization; on-chip storage; software-managed streaming memory systems; Bandwidth; Coherence; Computer architecture; Organizations; Program processors; Proposals; Random access memory;
Conference_Titel :
Embedded Computer Systems (SAMOS), 2011 International Conference on
Conference_Location :
Samos
Print_ISBN :
978-1-4577-0802-2
Electronic_ISBN :
978-1-4577-0801-5
DOI :
10.1109/SAMOS.2011.6045469