• DocumentCode
    3692774
  • Title

    MRemu: An Emulation-Based Framework for Datacenter Network Experimentation Using Realistic MapReduce Traffic

  • Author

    Marcelo Veiga Neves;Cesar A.F. De Rose;Kostas Katrinis

  • Author_Institution
    Pontifical Catholic Univ. of Rio Grande do Sul, Porto Alegre, Brazil
  • fYear
    2015
  • Firstpage
    174
  • Lastpage
    177
  • Abstract
    As data volumes and the need for timely analysis grow, Big Data analytics frameworks have to scale out to hundred or even thousands of commodity servers. While such a scale-out is crucial to sustain desired computational throughput/latency and storage capacity, it comes at the cost of increased network traffic volumes and multiplicity of traffic patterns. Despite the sheer reality of the dependency between datacenter network (DCN) and time-to-insight through big data analysis, our experience as active networking researchers conveys that a large fraction of DCN research experimentation is conducted on network traces and/or synthetic flow traces. And while the respective results are often valuable as standalone contributions, in practice it turns out extremely difficult to quantitatively assess how the reported network optimization results translate to performance or fault-tolerance improvement for actual analytics runtimes, e.g., due to the ability of these runtimes to overlap communication with computation. This paper presents MRemu, an emulation-based framework for conducting reproducible datacenter network research using accurate MapReduce workloads and at system scales that are relevant to the size of target deployments, albeit without requiring access to a hardware infrastructure of such scale. We choose the MapReduce (MR) framework as a design point, for it is a common representative of the most widely deployed frameworks for analysis of large volumes of - structured and unstructured - data and is reported to be highly sensitive to network performance. With MRemu, it is possible to quantify the impact of various network design parameters and software-defined control techniques to key performance indicators of a given MR application. We show through targeted experimental validation that MRemu exhibits high fidelity, when compared to the performance of MR applications on a real scale-out cluster of 16 high-end servers.
  • Keywords
    "Emulation","Network topology","Big data","Topology","Servers","Data mining","Monitoring"
  • Publisher
    ieee
  • Conference_Titel
    Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2015 IEEE 23rd International Symposium on
  • ISSN
    1526-7539
  • Type

    conf

  • DOI
    10.1109/MASCOTS.2015.36
  • Filename
    7330188