Balancing Thread-Level and Task-Level Parallelism for Data-Intensive Workloads on Clusters and Clouds

Author

Olivia Choudhury;Dinesh Rajan;Nicholas Hazekamp;Sandra Gesing;Douglas Thain;Scott Emrich

Author_Institution

Dept. of Comput. Sci. &

fYear

2015

Firstpage

390

Lastpage

393

Abstract

The runtime configuration of parallel and distributed applications remains a mysterious art. To tune an application on a particular system, the end-user must choose the number of machines, the number of cores per task, the data partitioning strategy, and so on, all of which result in a combinatorial explosion of choices. While one might try to exhaustively evaluate all choices in search of the optimal, the end user´s goal is simply to run the application once with reasonable performance by avoiding terrible configurations. To address this problem, we present a hybrid technique based on regression models for tuning data intensive bioinformatics applications: the sequential computational kernel is characterized empirically and then incorporated into an ab initio model of the distributed system. We demonstrate this technique on the commonly-used applications BWA, Bowtie2, and BLASR and validate the accuracy of our proposed models on clouds and clusters.

Keywords

"Computational modeling","Data models","Bioinformatics","Instruction sets","Predictive models","Parallel processing","Genomics"

Publisher

ieee

Conference_Titel

Cluster Computing (CLUSTER), 2015 IEEE International Conference on

Type

conf

DOI

10.1109/CLUSTER.2015.60

Filename

7307607