DocumentCode :
3675979
Title :
Scaling Up Bioinformatics Workflows with Dynamic Job Expansion: A Case Study Using Galaxy and Makeflow
Author :
Nicholas Hazekamp;Joseph Sarro;Olivia Choudhury;Sandra Gesing;Scott Emrich;Douglas Thain
Author_Institution :
Dept. of Comput. Sci. &
fYear :
2015
Firstpage :
332
Lastpage :
341
Abstract :
Logical workflow management systems provide a user-friendly portal through which data can be processed using a sequence of standard tools. These logical workflows are a natural way to express the high level intent of the user, and to share the structure and the results with other users. However, logical workflows are not necessarily suited to expressing parallelism for very large runs. As the amount of data is scaled up, the run time of each node in the logical workflow may become extreme. We propose a technique of job expansion to solve this problem. When job expansion is applied to a logical workflow, each node in the workflow is itself expanded into a large performance workflow that may consist of hundreds to thousands of tasks that can be executed in parallel, thus enabling high concurrency and scalability. From the user´s perspective, nothing has changed and the logical workflow remains in its original form. To demonstrate this technique, we have applied job expansion to a selection of bioinformatics applications running in the Galaxy workflow management system. Each job in the workflow is expanded into a highly parallel workflow executed using Makeflow, which is well suited to express high levels of parallelism. Work Queue is then utilized for execution because of its ability to quickly dispatch tasks and cache files for later reuse. After applying job expansion, we improve the execution time of BWA 18X and GATK 402X, with a total speedup of 61.5X on the workflow. We also take a look at the systems behavior since its launch to analyze its effectiveness.
Keywords :
"Bioinformatics","Complexity theory","Parallel processing","Runtime","Indexes","Portals","Concurrent computing"
Publisher :
ieee
Conference_Titel :
e-Science (e-Science), 2015 IEEE 11th International Conference on
Type :
conf
DOI :
10.1109/eScience.2015.39
Filename :
7304316
Link To Document :
بازگشت