• DocumentCode
    720553
  • Title

    GERBIL: MPI+YARN

  • Author

    Luna Xu ; Min Li ; Butt, Ali R.

  • Author_Institution
    Virginia Tech, Blacksburg, VA, USA
  • fYear
    2015
  • fDate
    4-7 May 2015
  • Firstpage
    627
  • Lastpage
    636
  • Abstract
    Emerging big data applications comprise rich multi-faceted workflows with both compute-intensive and data-intensive tasks, and intricate communication patterns. While MapReduce is an effective model for data-intensive tasks, the MPI programming model may be better suited for extracting high-performance for compute-intensive tasks. Researchers have recognized this need to employ specialized models for different phases of a workflow, e.g., performing computations using MPI followed by visualizations using MapReduce. However, extant multi-cluster approaches are inefficient as they entail data movement across clusters and porting across data formats. Consequently, there is a crucial need for disparate programming models to co-exist on the same set of resources. In this paper, we address the above issue by designing GERBIL, a framework for transparently co-hosting unmodified MPI applications alongside MapReduce applications on the same cluster. GERBIL exploits YARN as the model agnostic resource negotiator, and provides an easy-to-use interface to the users. GERBIL bridges the fundamental mismatch between YARN and MPI by designing an MPI-aware resource allocation mechanism. We also support five different optimizations: minimizing job wait time, achieving inter-process locality, achieving desired cluster utilization, minimizing network traffic, and minimizing job execution time, all in a multi-tenant environment. Our evaluation shows that GERBIL enables MPI executions with performance comparable to a native MPI setup, and improve compute-intensive applications performance by up to 133% when compared to the corresponding MapReduce-based versions.
  • Keywords
    Big Data; application program interfaces; data handling; message passing; parallel processing; resource allocation; GERBIL bridges; MPI applications; MPI programming model; MPI+YARN; MPI-aware resource allocation; MapReduce applications; big data applications; compute-intensive applications; compute-intensive tasks; data-intensive tasks; disparate programming models; high-performance tasks; model agnostic resource negotiator; multicluster approaches; multifaceted workflows; Clustering algorithms; Computational modeling; Containers; Data models; Programming; Resource management; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
  • Conference_Location
    Shenzhen
  • Type

    conf

  • DOI
    10.1109/CCGrid.2015.137
  • Filename
    7152528