مرکز منطقه ای اطلاع رساني علوم و فناوري - Understanding the Memory Behavior of Emerging Multi-core Workloads

DocumentCode :

1828224

Title :

Understanding the Memory Behavior of Emerging Multi-core Workloads

Author :

Lin, Junmin ; Chen, Yu ; Li, Wenlong ; Jaleel, Aamer ; Tang, Zhizhong

Author_Institution :

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

fYear :

2009

fDate :

June 30 2009-July 4 2009

Firstpage :

153

Lastpage :

160

Abstract :

This paper characterizes the memory behavior on emerging RMS (recognition, mining, and synthesis) workloads for future multi-core processors. As multi-core processors proliferate across different application domains, and the number of on-die cores continues to increase, a key issue facing processor architects is the design of the on-die last level cache (LLC). In this paper, we explore the LLC design space for multi-threaded RMS workloads by examining the working set sizes, data sharing behavior, and spatial data locality. Our study reveals that these RMS workloads are memory intensive, have large working-set sizes greater than 16 MB on average, exhibit a significant amount of data sharing, about 47% on average, and show strong strided streaming access behavior with 77% of accesses in regular pattern. Based on the observations, we then investigate the potential cache architecture choices for future multi-core design. Our experiments show that for these workloads large DRAM caches can be useful to address their large working sets; e.g., a 128 MB DRAM cache can reduce the average L1 miss penalty by 18%; shared last level cache provides better cache performance than private cache; e.g., a 8 MB shared cache provides 25% performance improvement over a private one with the same total size; and stride based hardware prefetcher provides significant performance benefit by 25%. As a result, we suggest a memory hierarchy with a 128 MB DRAM cache, a 8 MB on-die SRAM shared cache and an 8-entry stride prefetcher to accommodate RMS workloads.

Keywords :

DRAM chips; SRAM chips; cache storage; memory architecture; microcomputers; DRAM cache; SRAM shared cache; cache architecture; data sharing behavior; memory hierarchy; memory intensive workload; memory size 128 MByte; memory size 8 MByte; multi-core processors; multi-threaded RMS workloads; on-die last level cache; recognition, mining, and synthesis; spatial data locality; stride prefetcher; working set sizes; Application software; Character recognition; Computer science; Distributed computing; Hardware; Microprocessors; Multicore processing; Prefetching; Random access memory; Scalability; RMS workload; memory performance; workload characterization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Computing, 2009. ISPDC '09. Eighth International Symposium on

Conference_Location :

Lisbon

Print_ISBN :

978-0-7695-3680-4

Type :

conf

DOI :

10.1109/ISPDC.2009.14

Filename :

5284359

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1828224