DocumentCode :
3717235
Title :
Spaler: Spark and GraphX based de novo genome assembler
Author :
Anas Abu-Doleh;?mit V. ?ataly?rek
Author_Institution :
Dept. of Electrical and Computer Engineering, The Ohio State University
fYear :
2015
Firstpage :
1013
Lastpage :
1018
Abstract :
The recent advancements in high-throughput genome sequencing technologies have accelerated the efficient discovery of novel genomes. De novo assembly is the first and one of the most computationally intensive step to analyze such novel genomes. In this work, we addressed the problem of parallelizing the de Bruijn graph based de novo genome sequence assembly on distributed memory systems. We proposed a new tool, Spaler, which assembles short reads efficiently and accurately. Spaler is based on Spark framework and GraphX API. We compared the performance of Spaler to other distributed memory based assemblers, in particular, ABySS, Ray and SWAP-Assembler. The results show that Spaler scales better than existing tools and produces comparable or better results in terms of solution quality.
Keywords :
"Merging","Assembly","Genomics","Bioinformatics","Indexes","DNA","Sparks"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363853
Filename :
7363853
Link To Document :
بازگشت