DocumentCode :
1828224
Title :
Understanding the Memory Behavior of Emerging Multi-core Workloads
Author :
Lin, Junmin ; Chen, Yu ; Li, Wenlong ; Jaleel, Aamer ; Tang, Zhizhong
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2009
fDate :
June 30 2009-July 4 2009
Firstpage :
153
Lastpage :
160
Abstract :
This paper characterizes the memory behavior on emerging RMS (recognition, mining, and synthesis) workloads for future multi-core processors. As multi-core processors proliferate across different application domains, and the number of on-die cores continues to increase, a key issue facing processor architects is the design of the on-die last level cache (LLC). In this paper, we explore the LLC design space for multi-threaded RMS workloads by examining the working set sizes, data sharing behavior, and spatial data locality. Our study reveals that these RMS workloads are memory intensive, have large working-set sizes greater than 16 MB on average, exhibit a significant amount of data sharing, about 47% on average, and show strong strided streaming access behavior with 77% of accesses in regular pattern. Based on the observations, we then investigate the potential cache architecture choices for future multi-core design. Our experiments show that for these workloads large DRAM caches can be useful to address their large working sets; e.g., a 128 MB DRAM cache can reduce the average L1 miss penalty by 18%; shared last level cache provides better cache performance than private cache; e.g., a 8 MB shared cache provides 25% performance improvement over a private one with the same total size; and stride based hardware prefetcher provides significant performance benefit by 25%. As a result, we suggest a memory hierarchy with a 128 MB DRAM cache, a 8 MB on-die SRAM shared cache and an 8-entry stride prefetcher to accommodate RMS workloads.
Keywords :
DRAM chips; SRAM chips; cache storage; memory architecture; microcomputers; DRAM cache; SRAM shared cache; cache architecture; data sharing behavior; memory hierarchy; memory intensive workload; memory size 128 MByte; memory size 8 MByte; multi-core processors; multi-threaded RMS workloads; on-die last level cache; recognition, mining, and synthesis; spatial data locality; stride prefetcher; working set sizes; Application software; Character recognition; Computer science; Distributed computing; Hardware; Microprocessors; Multicore processing; Prefetching; Random access memory; Scalability; RMS workload; memory performance; workload characterization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Computing, 2009. ISPDC '09. Eighth International Symposium on
Conference_Location :
Lisbon
Print_ISBN :
978-0-7695-3680-4
Type :
conf
DOI :
10.1109/ISPDC.2009.14
Filename :
5284359
Link To Document :
بازگشت