• DocumentCode
    2957919
  • Title

    A Parallel Algorithm for Spectrum-based Short Read Error Correction

  • Author

    Shah, Ankit R. ; Chockalingam, Sriram ; Aluru, Srinivas

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Bombay, Mumbai, India
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    60
  • Lastpage
    70
  • Abstract
    Correcting sequence errors in high-throughput DNA sequencing by taking advantage of redundant sampling and low error rates is often an important first step in applications of this technology. Consequently, a number of error correction methods have been developed in the recent years. Due to an order of magnitude throughput gain per year, some of these technologies are now generating upwards of a billion reads per run. In this paper, we present an algorithm for parallel zing error correction methods that are based on frequency spectrum of kmers observed in input reads. Based on this, we present a parallelization of Reptile, a recently introduced error correction method that employs frequency spectrum of two different lengths, one for identifying correction possibilities and another for providing contextual information. Our method is well suited for distributed memory parallel computers and clusters. Experimental results indicate the method achieves near linear speedup and provides the ability to scale to larger data sets than previously demonstrated.
  • Keywords
    biocomputing; error correction; parallel algorithms; DNA sequencing; Reptile; low error rates; parallel algorithm; parallel zing error correction methods; redundant sampling; sequence errors; spectrum-based short read error correction; Computers; DNA; Error analysis; Error correction; Genomics; Hamming distance; Throughput; genome assembly; high-throughput sequencing; next-gen sequencing; parallel error correction; sequence base calling; short read error correction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4673-0975-2
  • Type

    conf

  • DOI
    10.1109/IPDPS.2012.16
  • Filename
    6267824