DocumentCode :
668154
Title :
AptStore: Dynamic storage management for hadoop
Author :
Krish, K.R. ; Khasymski, Aleksandr ; Butt, Ali R. ; Tiwari, Sunita ; Bhandarkar, Milind
Author_Institution :
Virginia Tech, Blacksburg, VA, USA
fYear :
2013
fDate :
23-27 Sept. 2013
Firstpage :
1
Lastpage :
5
Abstract :
Typical Hadoop setups employ Direct Attached Storage (DAS) with compute nodes and uniform replication of data to sustain high I/O throughput and fault tolerance. However, not all data is accessed at the same time or rate. Thus, if a large replication factor is used to support higher throughput for popular data, it wastes storage by unnecessarily replicating unpopular data as well. Conversely, if less replication is used to conserve storage for the unpopular data, it means fewer replicas for even popular data and thus lower I/O throughput. We present AptStore, a dynamic data management system for Hadoop, which aims to improve overall I/O throughput while reducing storage cost. We design a tiered storage that uses the standard DAS for popular data to sustain high I/O throughput, and network-attached enterprise filers for cost-effective, fault-tolerant, but lower-throughput storage for unpopular data. We design a file Popularity Predictor (PP) that analyzes file system audit logs and predicts the appropriate storage policy of each file, as well as use the information for transparent data movement between tiers. Our evaluation of AptStore on a real cluster shows 21.3% improvement in application execution time over standard Hadoop, while trace driven simulations show 23.7% increase in read throughput and 43.4% reduction in the storage capacity requirement of the system.
Keywords :
distributed processing; storage management; AptStore system; Hadoop; application execution time; compute nodes; data replication; direct attached storage; dynamic data management system; dynamic storage management; file popularity predictor; file system audit logs; input-output throughput; network-attached enterprise filers; read throughput; replication factor; storage capacity requirement; storage cost reduction; tiered storage; trace driven simulations; Analytical models; Energy consumption; Engines; Fault tolerance; Fault tolerant systems; Standards; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2013 IEEE International Conference on
Conference_Location :
Indianapolis, IN
Type :
conf
DOI :
10.1109/CLUSTER.2013.6702657
Filename :
6702657
Link To Document :
بازگشت