DocumentCode :
3632344
Title :
Parallel short sequence mapping for high throughput genome sequencing
Author :
Doruk Bozdag;Catalin C. Barbacioru;Umit V. Catalyurek
Author_Institution :
The Ohio State University, Dept. of Biomedical Informatics, Columbus, 43210, USA
fYear :
2009
Firstpage :
1
Lastpage :
10
Abstract :
With the advent of next-generation high throughput sequencing instruments, large volumes of short sequence data are generated at an unprecedented rate. Processing and analyzing these massive data requires overcoming several challenges including mapping of generated short sequences to a reference genome. This computationally intensive process takes time on the order of days using existing sequential techniques on large scale datasets. In this work, we propose six parallelization methods to speedup short sequence mapping and to reduce the execution time under just a few hours for such large datasets. We comparatively present these methods and give theoretical cost models for each method. Experimental results on real datasets demonstrate the effectiveness of the parallel methods and indicate that the cost models help accurate estimation of parallel execution time. Based on these cost models we implemented a selection function to predict the best method for a given scenario. To the best of our knowledge this is the first study on parallelization of short sequence mapping problem.
Keywords :
"Throughput","Genomics","Bioinformatics","Sequences","Instruments","Costs","Genetics","Biomedical informatics","Predictive models","DNA"
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on
ISSN :
1530-2075
Print_ISBN :
978-1-4244-3751-1
Type :
conf
DOI :
10.1109/IPDPS.2009.5161075
Filename :
5161075
Link To Document :
بازگشت