Title :
vHadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration
Author :
Ye, Kejiang ; Jiang, Xiaohong ; He, Yanzhang ; Li, Xiang ; Yan, Haiming ; Huang, Peng
Author_Institution :
Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
Abstract :
Big data processing is currently becoming increasingly important in modern era due to the continuous growth of the amount of data generated by various fields such as particle physics, human genomics, earth observation, etc. However, the efficiency of processing large-scale data on modern virtual infrastructure, especially on the virtualized cloud computing infrastructure, is not clear. This paper focuses on the performance of hadoop virtual cluster and proposes a scalable hadoop virtual cluster platform vHadoop for the large-scale MapReduce-based parallel data processing. We first describe the design and implementation of vHadoop platform. Then we perform a series of experiments to investigate both the static and dynamic performance of vHadoop platform, such as the performance characterization of cross-domain hadoop virtual cluster and live migraiton of hadoop virtual cluster. After that, we use the vHadoop platform to process 6 typical parallel clustering algorithms, such as Canopy, Dirichlet, Fuzzy k-Means, k-Means, Mean Shift, MinHash, etc, on two typical datasets. Experimental results verify the efficiency of vHadoop platform to process the MapReduce-based parallel machine learning applications.
Keywords :
cloud computing; learning (artificial intelligence); parallel processing; MapReduce based parallel data processing; MapReduce based parallel machine learning; cross domain hadoop virtual cluster; parallel clustering algorithm; scalable hadoop virtual cluster platform; vHadoop platform; virtual infrastructure; virtualized cloud computing infrastructure; Benchmark testing; Machine learning; Machine learning algorithms; Monitoring; Parallel machines; Virtual machining; Big Data; Cloud Computing; Hadoop; Machine Learning; MapReduce; Virtual Cluster;
Conference_Titel :
Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2893-7
DOI :
10.1109/ClusterW.2012.32