Title :
Using simulation to explore distributed key-value stores for extreme-scale system services
Author :
Ke Wang ; Kulkarni, Akhil ; Lang, Michael ; Arnold, Dorian ; Raicu, Ioan
Author_Institution :
Illinois Inst. of Technol., Chicago, IL, USA
Abstract :
Owing to the significant high rate of component failures at extreme scales, system services will need to be failure-resistant, adaptive and self-healing. A majority of HPC services are still designed around a centralized paradigm and hence are susceptible to scaling issues. Peer-to-peer services have proved themselves at scale for wide-area internet workloads. Distributed key-value stores (KVS) are widely used as a building block for these services, but are not prevalent in HPC services. In this paper, we simulate KVS for various service architectures and examine the design trade-offs as applied to HPC service workloads to support extreme-scale systems. The simulator is validated against existing distributed KVS-based services. Via simulation, we demonstrate how failure, replication, and consistency models affect performance at scale. Finally, we emphasize the general use of KVS to HPC services by feeding real HPC service workloads into the simulator and presenting a KVS-based distributed job launch prototype.
Keywords :
Internet; discrete event simulation; fault tolerant computing; parallel processing; peer-to-peer computing; system recovery; wide area networks; HPC service workloads; KVS-based distributed job launch prototype; adaptive services; centralized paradigm; component failures; consistency models; distributed KVS-based services; distributed key-value stores; extreme-scale system services; extreme-scale systems; failure models; failure-resistant services; peer-to-peer services; replication models; self-healing services; service architectures; wide-area Internet workloads; Abstracts; Data models; Fingers; Laboratories; Scalability; Servers; Taxonomy; Discrete Event Simulation; Extreme Scales; Key-Value Store; System Services;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
Conference_Location :
Denver, CO
Print_ISBN :
978-1-4503-2378-9
DOI :
10.1145/2503210.2503239