مرکز منطقه ای اطلاع رساني علوم و فناوري - On the use of shared storage in shared-nothing environments

DocumentCode :

659440

Title :

On the use of shared storage in shared-nothing environments

Author :

Krish, K.R. ; Khasymski, Aleksandr ; Guanying Wang ; Butt, Ali R. ; Makkar, Gaurav

fYear :

2013

fDate :

6-9 Oct. 2013

Firstpage :

313

Lastpage :

318

Abstract :

Shared-nothing environments, exemplified by systems such as MapReduce and Hadoop, employ node-local storage to achieve high scalability. The exponential growth in application datasets, however, demands ever higher I/O throughput and disk capacity. Simply equipping individual nodes in a Hadoop cluster with more disks is not scalable as it: increases the per-node cost, increases the probability of storage failure at the node, and worsens node failure recovery times. To this end, we propose dividing a Hadoop rack into several (small) sub-racks, and consolidating disks of a sub-rack´s compute nodes into a separate shared Localized Storage Node (LSN) within the subrack. Such a shared LSN is easier to manage and provision, and can offer an economically better solution by employing overall fewer disks at the LSN than the total of the sub-rack´s individual nodes, while still achieving high I/O performance. In this paper, we provide a quantitative study on the impact of shared storage in Hadoop clusters. We utilize several typical Hadoop applications and test them on a medium-sized cluster and via simulations. Our evaluation shows that: (i) the staggered workload allows our design to support the same number of compute nodes at a comparable or better throughput using fewer total disks than in the node-local case, thus providing more efficient resource utilization; (ii) the impact of lost locality can be mitigated by better provisioning the LSN-node network interconnect and the number of disks in an LSN; and (iii) the consolidation of disks into an LSN is a viable and efficient alternative to the extant node-local storage design. Finally, we show that LSN-based design can deliver up to 39% performance improvement over standard Hadoop.

Keywords :

disc storage; memory architecture; resource allocation; shared memory systems; Hadoop cluster; Hadoop rack; I/O performance; I/O throughput; LSN-based design; LSN-node network interconnect; MapReduce; disk capacity; disks consolidation; localized storage node; lost locality; medium-sized cluster; node failure recovery; node-local storage design; resource utilization; shared LSN; shared storage; shared-nothing environments; staggered workload; storage failure; subrack nodes; Aggregates; Bandwidth; Computational modeling; Computer architecture; Standards; Throughput; Topology;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data, 2013 IEEE International Conference on

Conference_Location :

Silicon Valley, CA

Type :

conf

DOI :

10.1109/BigData.2013.6691589

Filename :

6691589

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=659440