DocumentCode
2957919
Title
A Parallel Algorithm for Spectrum-based Short Read Error Correction
Author
Shah, Ankit R. ; Chockalingam, Sriram ; Aluru, Srinivas
Author_Institution
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Bombay, Mumbai, India
fYear
2012
fDate
21-25 May 2012
Firstpage
60
Lastpage
70
Abstract
Correcting sequence errors in high-throughput DNA sequencing by taking advantage of redundant sampling and low error rates is often an important first step in applications of this technology. Consequently, a number of error correction methods have been developed in the recent years. Due to an order of magnitude throughput gain per year, some of these technologies are now generating upwards of a billion reads per run. In this paper, we present an algorithm for parallel zing error correction methods that are based on frequency spectrum of kmers observed in input reads. Based on this, we present a parallelization of Reptile, a recently introduced error correction method that employs frequency spectrum of two different lengths, one for identifying correction possibilities and another for providing contextual information. Our method is well suited for distributed memory parallel computers and clusters. Experimental results indicate the method achieves near linear speedup and provides the ability to scale to larger data sets than previously demonstrated.
Keywords
biocomputing; error correction; parallel algorithms; DNA sequencing; Reptile; low error rates; parallel algorithm; parallel zing error correction methods; redundant sampling; sequence errors; spectrum-based short read error correction; Computers; DNA; Error analysis; Error correction; Genomics; Hamming distance; Throughput; genome assembly; high-throughput sequencing; next-gen sequencing; parallel error correction; sequence base calling; short read error correction;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location
Shanghai
ISSN
1530-2075
Print_ISBN
978-1-4673-0975-2
Type
conf
DOI
10.1109/IPDPS.2012.16
Filename
6267824
Link To Document