Title :
Reconstructing latent periods in genome sequences with insertions and deletions
Author :
Arora, Raman ; Dewey, Colin ; Sethares, William A.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Wisconsin-Madison, Madison, WI, USA
Abstract :
Tandem and latent repeats in genome sequences provide insight into its various structural and functional roles. Such regions in genome sequences are modeled as cyclostationary processes, generated by a collection of information sources in a cyclic manner. The maximum likelihood (ML) estimates can be easily generated for the cyclostationary profiles and for the statistical period of such subsequences. However, in the presence of insertions and deletions, the ML estimators suffer greatly in their ability to accurately identify the periods. This paper extends the cyclic model to a profile hidden Markov model (PHMM) to account for insertions and deletions. An iterative algorithm is developed to learn parameters of the PHMM and Viterbi algorithm is employed to learn the most likely path through the state space. This reconstructs likely insertions and deletions in the sequence and results in better estimates of the statistical period and cyclostationary profiles than the ML approach. Experimental results are provided with simulated sequences as well as with chromosome 1 sequence from human genome.
Keywords :
genomics; hidden Markov models; maximum likelihood estimation; molecular biophysics; state-space methods; Viterbi algorithm; cyclostationary process; genome sequence; latent period reconstruction; maximum likelihood estimation method; profile hidden Markov model; state space method; Bioinformatics; Biomedical engineering; DNA; Fourier transforms; Genomics; Hidden Markov models; Iterative algorithms; Maximum likelihood estimation; Random variables; Sequences;
Conference_Titel :
Genomic Signal Processing and Statistics, 2009. GENSIPS 2009. IEEE International Workshop on
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4244-4761-9
Electronic_ISBN :
978-1-4244-4762-6
DOI :
10.1109/GENSIPS.2009.5174377