Title :
ASCAR: Automating contention management for high-performance storage systems
Author :
Yan Li ; Xiaoyuan Lu ; Miller, Ethan L. ; Long, Darrell D. E.
Author_Institution :
Storage Syst. Res. Center, Univ. of California, Santa Cruz, Santa Cruz, CA, USA
Abstract :
High-performance parallel storage systems, such as those used by supercomputers and data centers, can suffer from performance degradation when a large number of clients are contending for limited resources, like bandwidth. These contentions lower the efficiency of the system and cause unwanted speed variances. We present the Automatic Storage Contention Alleviation and Reduction system (ASCAR), a storage traffic management system for improving the bandwidth utilization and fairness of resource allocation. ASCAR regulates I/O traffic from the clients using a rule based algorithm that controls the congestion window and rate limit. The rule-based client controllers are fast responding to burst I/O because no runtime coordination between clients or with a central coordinator is needed; they are also autonomous so the system has no scale-out bottleneck. Finding optimal rules can be a challenging task that requires expertise and numerous experiments. ASCAR includes a SHAred-nothing Rule Producer (SHARP) that produces rules in an unsupervised manner by systematically exploring the solution space of possible rule designs and evaluating the target workload under the candidate rule sets. Evaluation shows that our ASCAR prototype can improve the throughput of all tested workloads - some by as much as 35%. ASCAR improves the throughput of a NASA NPB BTIO checkpoint workload by 33.5% and reduces its speed variance by 55.4% at the same time. The optimization time and controller overhead are unrelated to the scale of the system; thus, it has the potential to support future large-scale systems that can have millions of clients and thousands of servers. As a pure client-side solution, ASCAR needs no change to either the hardware or server software.
Keywords :
parallel processing; resource allocation; storage management; ASCAR; I/O traffic; NASA NPB BTIO checkpoint workload; SHARP; automatic storage contention alleviation and reduction system; automating contention management; bandwidth utilization; burst I/O; central coordinator; client-side solution; congestion window; controller overhead; fairness; hardware software; high-performance parallel storage systems; optimal rules; optimization time; performance degradation; rate limit; resource allocation; rule based algorithm; rule designs; rule sets; rule-based client controllers; runtime coordination; server software; shared-nothing rule producer; speed variances; storage traffic management system; target workload; workloads throughput; Bandwidth; Control systems; Optimization; Process control; Servers; Software; Throughput;
Conference_Titel :
Mass Storage Systems and Technologies (MSST), 2015 31st Symposium on
Conference_Location :
Santa Clara, CA
DOI :
10.1109/MSST.2015.7208287