Title :
On the use of shared storage in shared-nothing environments
Author :
Krish, K.R. ; Khasymski, Aleksandr ; Guanying Wang ; Butt, Ali R. ; Makkar, Gaurav
Abstract :
Shared-nothing environments, exemplified by systems such as MapReduce and Hadoop, employ node-local storage to achieve high scalability. The exponential growth in application datasets, however, demands ever higher I/O throughput and disk capacity. Simply equipping individual nodes in a Hadoop cluster with more disks is not scalable as it: increases the per-node cost, increases the probability of storage failure at the node, and worsens node failure recovery times. To this end, we propose dividing a Hadoop rack into several (small) sub-racks, and consolidating disks of a sub-rack´s compute nodes into a separate shared Localized Storage Node (LSN) within the subrack. Such a shared LSN is easier to manage and provision, and can offer an economically better solution by employing overall fewer disks at the LSN than the total of the sub-rack´s individual nodes, while still achieving high I/O performance. In this paper, we provide a quantitative study on the impact of shared storage in Hadoop clusters. We utilize several typical Hadoop applications and test them on a medium-sized cluster and via simulations. Our evaluation shows that: (i) the staggered workload allows our design to support the same number of compute nodes at a comparable or better throughput using fewer total disks than in the node-local case, thus providing more efficient resource utilization; (ii) the impact of lost locality can be mitigated by better provisioning the LSN-node network interconnect and the number of disks in an LSN; and (iii) the consolidation of disks into an LSN is a viable and efficient alternative to the extant node-local storage design. Finally, we show that LSN-based design can deliver up to 39% performance improvement over standard Hadoop.
Keywords :
disc storage; memory architecture; resource allocation; shared memory systems; Hadoop cluster; Hadoop rack; I/O performance; I/O throughput; LSN-based design; LSN-node network interconnect; MapReduce; disk capacity; disks consolidation; localized storage node; lost locality; medium-sized cluster; node failure recovery; node-local storage design; resource utilization; shared LSN; shared storage; shared-nothing environments; staggered workload; storage failure; subrack nodes; Aggregates; Bandwidth; Computational modeling; Computer architecture; Standards; Throughput; Topology;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691589