• DocumentCode
    2998428
  • Title

    Parallel Pair-HMM SNP Detection

  • Author

    Clement, Nathan L. ; Shepherd, Brent A. ; Bodily, Paul ; Tumur-Ochir, Sukhbat ; Gim, Younghoon ; Snell, Quinn ; Clement, Mark J. ; Johnson, W. Evan

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    675
  • Lastpage
    683
  • Abstract
    I. MOTIVATION: Due to the massive amounts of data generated from each instrument run, next generation sequencing technologies have presented researchers with unique analytical challenges which require innovative, computationally efficient statistical solutions. Here we present a parallel implementation of a probabilistic Pair-Hidden Markov Model for base calling and SNP detection in next generation sequencing data. Our approach incorporates multiple sources of error into the base calling procedure which leads to more accurate results. In addition, our approach applies a likelihood ratio test that provides researchers with straight-forward SNP calling cutoffs based on a p-value cutoff or a false discovery control. II. RESULTS: We have developed GNUMAP-SNP, which is a highly accurate approach for the identification of SNPs in next generation sequencing data. By utilizing a novel probabilistic Pair-Hidden Markov Model, GNUMAP-SNP effectively accounts for uncertainty in the read calls as well as read mapping in an unbiased fashion. Our results show that GNUMAP-SNP has both high sensitivity and high specificity throughout the genome, which is especially true in repeat regions or in areas with low read coverage. In addition, we propose a statistical framework that accounts for the background noise using straightforward statistical cutoffs which filters out false-positive results. The parallel implementation of SNP calling achieves near linear speedup on distributed memory or shared memory platforms. III. AVAILABILITY: GNUMAP-SNP is available as a module in the GNUMAP probabilistic read mapping software. GNUMAP is freely available for download at: http://dna.cs.byu.edu/gnumap/.
  • Keywords
    bioinformatics; distributed shared memory systems; genetics; hidden Markov models; probability; GNUMAP probabilistic read mapping software; SNP calling cutoffs; background noise; base calling procedure; distributed memory; false discovery control; genome; likelihood ratio testing; parallel pair-HMM SNP detection; probabilistic pair-hidden Markov model; sequencing technology; shared memory platform; statistical cutoff; statistical solution; Bioinformatics; Frequency modulation; Genomics; Markov processes; Next generation networking; Probabilistic logic; Probability; biology computing; next-generation sequencing; parallel computing; sequence mappers; short-read mapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-0974-5
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2012.84
  • Filename
    6270706