DocumentCode :
2530924
Title :
Predicting Markov Chain Order in Genomic Sequences
Author :
Heath, Lenwood S. ; Pati, Amrita
Author_Institution :
Virginia Tech, Blacksburg
fYear :
2007
fDate :
2-4 Nov. 2007
Firstpage :
159
Lastpage :
164
Abstract :
Genomic sequences display characteristic features at various scales ranging from oligonucleotide frequencies to large organizational units such as genes. The generation of such a sequence, defined as a string over the alphabet SigmaDNA={A C, T, G}, can be approximated by a formal machine, a Markov chain having strings as states, whose parameters lend unique characteristics to the sequence. We present a formal mathematical framework that analyzes this approximation in terms of transition probabilities of words at various scales. Within this framework, we present an algorithm that estimates the order of the Markov chain of order omega hypothesized to have generated a sequence S at hand. Consider the probability of transition from string alpha to string gamma both of length omega, computed using both order w and order omega-1 Markov chains. The expected difference of the two probabilities thus obtained, is zero, and demonstrates a sharp positive transition for values of omega between omega and omega+1. Both mathematical and experimental results are obtained that explain the general behavior of the algorithm.
Keywords :
DNA; Markov processes; biology computing; genetics; DNA; Markov chain order prediction; formal mathematical framework; genomic sequences; oligonucleotide frequencies; transition probabilities; words; Bioinformatics; Computer displays; Computer science; DNA; Entropy; Fluctuations; Frequency estimation; Genomics; Sequences; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine, 2007. BIBM 2007. IEEE International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3031-4
Type :
conf
DOI :
10.1109/BIBM.2007.24
Filename :
4413050
Link To Document :
بازگشت