Title :
Log-Structured Global Array for Efficient Multi-Version Snapshots
Author :
Fujita, Hajime ; Nan Dun ; Rubenstein, Zachary A. ; Chien, Andrew A.
Author_Institution :
Argonne Nat. Lab., Univ. of Chicago, Chicago, IL, USA
Abstract :
In exascale systems, increasing error rate -- particularly silent data corruption -- is a major concern. The Global ViewResilience (GVR) system builds a new model of application resilience on versioned global arrays. These arrays can be exploited for flexible, application-specific error checking and recovery. We explore a fundamental challenge to the GVR model -- the cost of versioning. We propose a novel log-structured implementation that appends new data to an update log, simultaneously tracking modified regions and versioning incrementally. We compare performance of log-structured arrays to traditional flat arrays using micro-benchmarks and three full applications, and show that versioning can be more than 10x faster, and reduce memory cost significantly. Further, in future systems with NVRAM, a log-structured approach is more tolerant onramp limitations such as write bandwidth and wear-out.
Keywords :
configuration management; data structures; random-access storage; GVR system; NVRAM; application resilience; data corruption; error rate; exascale systems; flexible-application-specific error checking; flexible-application-specific error recovery; global view resilience system; incremental versioning; log update; log-structured global array; microbenchmarks; multiversion snapshots; tracking modified regions; versioned global arrays; versioning cost; wear-out; write bandwidth; Arrays; Message systems; Nonvolatile memory; Random access memory; Resilience; Servers; global array; global view resilience; latent errors; log-structured array; multi-version;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
Conference_Location :
Shenzhen
DOI :
10.1109/CCGrid.2015.80