Title :
Enabling Proactive Data Management in Virtualized Hadoop Clusters Based on Predicted Data Activity Patterns
Author :
Kousiouris, G. ; Vafiadis, George ; Varvarigou, Theodora
Author_Institution :
Dept. of Electr. & Comput. Eng., Nat. Tech. Univ. of Athens, Athens, Greece
Abstract :
Hadoop clusters are gaining more and more in popularity based on their ability to parallelize and complete large scale computational tasks on big data. Service offerings of this type have appeared in the recent years, covering a need for dynamic and on-demand creation of such data analytics frameworks. The aim of this paper is to provide a mechanism for offering such virtual clusters as a service, with built-in intelligence functionalities for efficient management. The target of these mechanisms is to predict future demand of the files in the HDFS cluster and dynamically manipulate the according replication factor for availability purposes, in order to improve performance and minimize storage overhead. To this end, real data have been utilized as a dataset input to the prediction framework, based on Fourier series analysis, due to the latter´s ability to capture different periodicities that can influence service usage. Multiple time-step ahead prediction is performed in order to enable proactive management (e.g. suitable replication strategy). We describe the framework´s architecture, necessary modifications to the client side of Apache Hadoop for data logging and the results of the applied method on two real world datasets.
Keywords :
Big Data; Fourier analysis; Fourier series; data analysis; Fourier series analysis; HDFS cluster; data activity patterns; data analytics frameworks; data logging; multiple time-step ahead prediction; proactive data management; proactive management; service offerings; two real world datasets; virtualized Hadoop clusters; Availability; File systems; Fourier series; Internet; Predictive models; Servers; Time series analysis; Big data; Fourier series analysis; dynamic management; proactive replication; time series prediction;
Conference_Titel :
P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2013 Eighth International Conference on
Conference_Location :
Compiegne
DOI :
10.1109/3PGCIC.2013.8