GERBIL: MPI+YARN

Author

Luna Xu ; Min Li ; Butt, Ali R.

Author_Institution

Virginia Tech, Blacksburg, VA, USA

fYear

2015

fDate

4-7 May 2015

Firstpage

627

Lastpage

636

Abstract

Emerging big data applications comprise rich multi-faceted workflows with both compute-intensive and data-intensive tasks, and intricate communication patterns. While MapReduce is an effective model for data-intensive tasks, the MPI programming model may be better suited for extracting high-performance for compute-intensive tasks. Researchers have recognized this need to employ specialized models for different phases of a workflow, e.g., performing computations using MPI followed by visualizations using MapReduce. However, extant multi-cluster approaches are inefficient as they entail data movement across clusters and porting across data formats. Consequently, there is a crucial need for disparate programming models to co-exist on the same set of resources. In this paper, we address the above issue by designing GERBIL, a framework for transparently co-hosting unmodified MPI applications alongside MapReduce applications on the same cluster. GERBIL exploits YARN as the model agnostic resource negotiator, and provides an easy-to-use interface to the users. GERBIL bridges the fundamental mismatch between YARN and MPI by designing an MPI-aware resource allocation mechanism. We also support five different optimizations: minimizing job wait time, achieving inter-process locality, achieving desired cluster utilization, minimizing network traffic, and minimizing job execution time, all in a multi-tenant environment. Our evaluation shows that GERBIL enables MPI executions with performance comparable to a native MPI setup, and improve compute-intensive applications performance by up to 133% when compared to the corresponding MapReduce-based versions.

Keywords

Big Data; application program interfaces; data handling; message passing; parallel processing; resource allocation; GERBIL bridges; MPI applications; MPI programming model; MPI+YARN; MPI-aware resource allocation; MapReduce applications; big data applications; compute-intensive applications; compute-intensive tasks; data-intensive tasks; disparate programming models; high-performance tasks; model agnostic resource negotiator; multicluster approaches; multifaceted workflows; Clustering algorithms; Computational modeling; Containers; Data models; Programming; Resource management; Yarn;

fLanguage

English

Publisher

ieee

Conference_Titel

Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on

Conference_Location

Shenzhen

Type

conf

DOI

10.1109/CCGrid.2015.137

Filename

7152528