• DocumentCode
    632452
  • Title

    A Hadoop performance model for multi-rack clusters

  • Author

    Jungkyu Han ; Ishii, M. ; Makino, Hiroaki

  • Author_Institution
    Software Innovation Center, Nippon Telegraph & Telephone Corp., Tokyo, Japan
  • fYear
    2013
  • fDate
    27-28 March 2013
  • Firstpage
    265
  • Lastpage
    274
  • Abstract
    Hadoop becomes de facto standard framework for big data analysis due to its scalability. Despite of the importance of Hadoop´s scalability, there are a few works have been made on the scalability in multi-rack clusters. In multi-rack clusters of real world, network topology becomes a major scalability bottleneck due to the limited network switch capacity. It is a waste of resources to add servers to a Hadoop cluster in such situation. Therefore, it is helpful for users to save cost by efficiently measuring the network influence to Hadoop before they add a new server to their clusters. In this paper, we describe a Hadoop performance model for the multi-rack clusters. We modeled network influence on Hadoop and achieved about 95% accuracy to the real measurement. Furthermore, we predicted Hadoop scalability in large clusters with our model and show Hadoop scales enough even in multi-rack clusters.
  • Keywords
    data analysis; distributed processing; pattern clustering; Hadoop performance model; Hadoop scalability; big data analysis; de facto standard framework; limited network switch capacity; multirack clusters; network influence measurement; network topology; Equations; Information management; Mathematical model; Network topology; Scalability; Servers; Switches; Distributed Data Processing; Hadoop; Map-Reduce; Performance Modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology (CSIT), 2013 5th International Conference on
  • Conference_Location
    Amman
  • Type

    conf

  • DOI
    10.1109/CSIT.2013.6588791
  • Filename
    6588791