Title :
Using Hidden Markov Modeling in DNA Sequencing
Author :
Nelson, Ruben ; Foo, Simon ; Weatherspoon, Mark
Author_Institution :
Florida State Univ., Tallahassee
Abstract :
Hidden Markov models (HMM) have largely demonstrated their usefulness in the fields of statistics and pattern recognition, particularly for speech recognition and hand writing recognition. In the field of genetics, the same principles of statistics and probability can be applied. DNA primarily has four bases: adenine, guanine, thymine, and cytosine, which when paired together can form nucleotides. However, the length of a nucleotide chain can be uncertain. The DNA sequence constitutes the heritable genetic information in nuclei that forms the basis for the developmental programs of all living organisms. Determining the DNA sequence is therefore useful in studying fundamental biological processes, as well as in diagnostic or forensic research. In this study, we will utilize hidden Markov models (HMM) to determine DNA sequence likelihoods. A training sequence of nucleotide bases of the first 1000 bases of rice chromosomes will be used, and the transition and emission probabilities would determine a probable DNA sequence of the next 2000 bases. This sequence should be comparable to the actual sequence. However, experimentation did not show this to be the case, despite previous experiments showing otherwise. Only a fourth of a nucleotide sequence was ever classified correctly.
Keywords :
biocomputing; handwriting recognition; hidden Markov models; speech recognition; statistics; DNA sequencing; adenine; cytosine; guanine; hand writing recognition; hidden Markov modeling; nucleotide chain; pattern recognition; probability; speech recognition; statistics; thymine; DNA; Genetics; Hidden Markov models; Organisms; Pattern recognition; Probability; Sequences; Speech recognition; Statistics; Writing; DNA sequencing; Hidden Markov Model;
Conference_Titel :
System Theory, 2008. SSST 2008. 40th Southeastern Symposium on
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4244-1806-0
Electronic_ISBN :
0094-2898
DOI :
10.1109/SSST.2008.4480223