• DocumentCode
    3286159
  • Title

    Bayesian Basecalling for DNA Sequence Analysis using Hidden Markov Models

  • Author

    Liang, Kuo-Ching ; Wang, Xiaodong ; Anastassiou, Dimitris

  • Author_Institution
    Dept. of Electr. Eng., Columbia Univ., New York, NY
  • fYear
    2006
  • fDate
    22-24 March 2006
  • Firstpage
    1599
  • Lastpage
    1604
  • Abstract
    It has been shown that electropherograms of DNA sequences can be modelled with hidden Markov models. Base-calling, the procedure that determines the sequence of bases from the given eletropherogram, can then be performed using the Viterbi algorithm. A training step is required prior to basecalling in order to estimate the HMM parameters. In this paper, we propose a Bayesian approach which employs the Markov chain Monte Carlo (MCMC) method to perform basecalling. Such an approach not only allows one to naturally encode the prior biological knowledge into the basecalling algorithm, it also exploits both the training data and the basecalling data in estimating the HMM parameters, leading to more accurate estimates. Using the recently sequenced genome of the organism Legionella pneumophila we show that similar performance as the state-of-the-art basecalling algorithm in terms of total errors can be achieved even when a simple Gaussian model is assumed for the emission densities.
  • Keywords
    Bayes methods; DNA; Gaussian processes; Monte Carlo methods; hidden Markov models; Bayesian basecalling approach; DNA sequence analysis; Gaussian model; HMM parameter; Legionella pneumophila; MCMC method; Markov chain Monte Carlo method; hidden Markov model; Bayesian methods; Biological information theory; DNA; Genomics; Hidden Markov models; Monte Carlo methods; Parameter estimation; Sequences; Training data; Viterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Sciences and Systems, 2006 40th Annual Conference on
  • Conference_Location
    Princeton, NJ
  • Print_ISBN
    1-4244-0349-9
  • Electronic_ISBN
    1-4244-0350-2
  • Type

    conf

  • DOI
    10.1109/CISS.2006.286391
  • Filename
    4068057