Title :
Spaler: Spark and GraphX based de novo genome assembler
Author :
Anas Abu-Doleh;?mit V. ?ataly?rek
Author_Institution :
Dept. of Electrical and Computer Engineering, The Ohio State University
Abstract :
The recent advancements in high-throughput genome sequencing technologies have accelerated the efficient discovery of novel genomes. De novo assembly is the first and one of the most computationally intensive step to analyze such novel genomes. In this work, we addressed the problem of parallelizing the de Bruijn graph based de novo genome sequence assembly on distributed memory systems. We proposed a new tool, Spaler, which assembles short reads efficiently and accurately. Spaler is based on Spark framework and GraphX API. We compared the performance of Spaler to other distributed memory based assemblers, in particular, ABySS, Ray and SWAP-Assembler. The results show that Spaler scales better than existing tools and produces comparable or better results in terms of solution quality.
Keywords :
"Merging","Assembly","Genomics","Bioinformatics","Indexes","DNA","Sparks"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363853