DocumentCode :
602592
Title :
RECAP: A region-based cure for the common cold (cache)
Author :
Zebchuk, J. ; Cain, H.W. ; Xin Tong ; Srinivasan, V. ; Moshovos, Andreas
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Toronto, Toronto, ON, Canada
fYear :
2013
fDate :
23-27 Feb. 2013
Firstpage :
83
Lastpage :
94
Abstract :
Virtualization has become a magic bullet to increase utilization, improve security, lower costs, and reduce management overheads. In many scenarios, the number of virtual machines consolidated onto a single processor has grown even faster than the number of hardware threads. This results in multiprogrammed virtualization where many virtual machines time-share a single processor core. Such fine-grain sharing comes at a cost; each time a virtual machine gets scheduled by the hypervisor, it effectively begins with a “cold” cache, since any cache blocks it accessed in the past have likely been evicted by other virtual machines. Recently, cache restoration prefetchers have been shown to reduce the cold cache effects caused by multiprogrammed virtualization. However, these prefetchers waste large amounts of bandwidth prefetching many useless blocks and writing and reading metadata to and from memory. Using block reuse based filtering, we enhance two existing cache restoration prefetchers and reduce their bandwidth overheads by 16% and 32%, with slowdowns of only 0.5% and 0.6%. We also propose a Region-Based Cache Restoration Prefetcher (RECAP), which groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks to prefetch the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that exploits spatial locality, RECAP provides a robust prefetcher that improves performance by up to 42% for some applications and only uses 14% more bandwidth and 3.5% more power than a stride prefetcher, on average. Compared to other prefetchers designed for multiprogrammed virtualization, RECAP provides comparable performance while using an average of 12% to 27% less bandwidth, and reducing the energy-delay product of the L2 cache and main memory by 12% to 13% on average.
Keywords :
cache storage; data compression; multiprogramming; power aware computing; processor scheduling; supervisory programs; virtual machines; virtualisation; L2 cache; RECAP; bandwidth overhead reduction; block reuse-based filtering; cache blocks; coarse-grain memory regions; cold cache effect reduction; compression technique; energy-delay product reduction; fine-grain sharing; hypervisor; management overhead reduction; metadata reading; metadata writing; multiprogrammed virtualization; performance improvement; processor scheduling; region-based cache restoration prefetcher; single processor core time-share; spatial locality; virtual machines; Bandwidth; Hardware; Pollution; Prefetching; Virtual machine monitors; Virtual machining; Virtualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on
Conference_Location :
Shenzhen
ISSN :
1530-0897
Print_ISBN :
978-1-4673-5585-8
Type :
conf
DOI :
10.1109/HPCA.2013.6522309
Filename :
6522309
Link To Document :
بازگشت