• DocumentCode
    3537751
  • Title

    Data-Intensive Workload Consolidation for the Hadoop Distributed File System

  • Author

    Moraveji, Reza ; Taheri, Javid ; Farahabady, Mohammad Reza Hosseiny ; Rizvandi, Nikzad Babaii ; Zomaya, Albert Y.

  • Author_Institution
    Sch. of Inf. Technol., Univ. of Sydney & Nat. ICT Australia (NICTA), Sydney, NSW, Australia
  • fYear
    2012
  • fDate
    20-23 Sept. 2012
  • Firstpage
    95
  • Lastpage
    103
  • Abstract
    Workload consolidation, sharing physical resources among multiple workloads, is a promising technique to save cost and energy in cluster computing systems. This paper highlights a number of challenges associated with workload consolidation for Hadoop; as one of the current state-of-the-art data-intensive cluster computing systems. Through a systematic step-by-step procedure, we investigate challenges for efficient server consolidation in Hadoop environments. To this end, we first investigate the inter-relationship between last level cache (LLC) contention and throughput degradation for consolidated workloads on a single physical server employing Hadoop distributed file system (HDFS). We then investigate the general case of consolidation on multiple physical servers so that their throughput never falls below a desired/predefined utilization level. We use our empirical results to model consolidation as a classic two-dimensional bin packing problem and then design a computationally efficient greedy algorithm to achieve minimum throughput degradation on multiple servers. Results are very promising and show that our greedy approach is able to achieve near optimal solutions in all experimented cases.
  • Keywords
    bin packing; cache storage; distributed databases; greedy algorithms; network operating systems; workstation clusters; Hadoop distributed file system; LLC contention; cluster computing systems; data-intensive workload consolidation; greedy algorithm; last level cache; physical resources; two-dimensional bin packing problem; Australia; Degradation; Educational institutions; File systems; Servers; Throughput; Writing; Bin Packing; Hadoop; Last Level Cache; Throughput Degradation; Workload Consolidation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid Computing (GRID), 2012 ACM/IEEE 13th International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1550-5510
  • Print_ISBN
    978-1-4673-2901-9
  • Type

    conf

  • DOI
    10.1109/Grid.2012.25
  • Filename
    6319159