Title :
A Hadoop performance model for multi-rack clusters
Author :
Jungkyu Han ; Ishii, M. ; Makino, Hiroaki
Author_Institution :
Software Innovation Center, Nippon Telegraph & Telephone Corp., Tokyo, Japan
Abstract :
Hadoop becomes de facto standard framework for big data analysis due to its scalability. Despite of the importance of Hadoop´s scalability, there are a few works have been made on the scalability in multi-rack clusters. In multi-rack clusters of real world, network topology becomes a major scalability bottleneck due to the limited network switch capacity. It is a waste of resources to add servers to a Hadoop cluster in such situation. Therefore, it is helpful for users to save cost by efficiently measuring the network influence to Hadoop before they add a new server to their clusters. In this paper, we describe a Hadoop performance model for the multi-rack clusters. We modeled network influence on Hadoop and achieved about 95% accuracy to the real measurement. Furthermore, we predicted Hadoop scalability in large clusters with our model and show Hadoop scales enough even in multi-rack clusters.
Keywords :
data analysis; distributed processing; pattern clustering; Hadoop performance model; Hadoop scalability; big data analysis; de facto standard framework; limited network switch capacity; multirack clusters; network influence measurement; network topology; Equations; Information management; Mathematical model; Network topology; Scalability; Servers; Switches; Distributed Data Processing; Hadoop; Map-Reduce; Performance Modeling;
Conference_Titel :
Computer Science and Information Technology (CSIT), 2013 5th International Conference on
Conference_Location :
Amman
DOI :
10.1109/CSIT.2013.6588791