Implementing high availability memory with a duplication cache

Author

Aggarwal, Nidhi ; Smith, James E. ; Saluja, Kewal K. ; Jouppi, Norman P. ; Ranganathan, Parthasarathy

Author_Institution

Univ. of Wisconsin-Madison, Madison, WI

fYear

2008

fDate

8-12 Nov. 2008

Firstpage

71

Lastpage

82

Abstract

High availability systems typically rely on redundant components and functionality to achieve fault detection, isolation and fail over. In the future, increases in error rates will make high availability important even in the commodity and volume market. Systems will be built out of chip multiprocessors (CMPs) with multiple identical components that can be configured to provide redundancy for high availability. However, the 100% overhead of making all components redundant is going to be unacceptable for the commodity market, especially when all applications might not require high availability. In particular, duplicating the entire memory like the current high availability systems (e.g. NonStop and Stratus) do is particularly problematic given the fact that system costs are going to be dominated by the cost of memory. In this paper, we propose a novel technique called a duplication cache to reduce the overhead of memory duplication in CMP-based high availability systems. A duplication cache is a reserved area of main memory that holds copies of pages belonging to the current write working set (set of actively modified pages) of running processes. All other pages are marked as read-only and are kept only as a single, shared copy. The size of the duplication cache can be configured dynamically at runtime and allows system designers to trade off the cost of memory duplication with minor performance overhead. We extensively analyze the effectiveness of our duplication cache technique and show that for a range of benchmarks memory duplication can be reduced by 60-90% with performance degradation ranging from 1-12%. On average, a duplication cache can reduce memory duplication by 60% for a performance overhead of 4% and by 90% for a performance overhead of 5%.

Keywords

cache storage; microprocessor chips; chip multiprocessors; duplication cache; high availability memory; memory duplication; system costs; Availability; Costs; Degradation; Electrical equipment industry; Error analysis; Fault detection; Microprocessors; Performance analysis; Redundancy; System-on-a-chip; component; duplication cache; high availability; low cost availability; memory duplication; selective replication;

fLanguage

English

Publisher

ieee

Conference_Titel

Microarchitecture, 2008. MICRO-41. 2008 41st IEEE/ACM International Symposium on

Conference_Location

Lake Como

ISSN

1072-4451

Print_ISBN

978-1-4244-2836-6

Electronic_ISBN

1072-4451

Type

conf

DOI

10.1109/MICRO.2008.4771780

Filename

4771780