Title :
Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures
Author :
Matsui, Hiroshi ; Sato, Kengo ; Sakakibara, Yasubumi
Author_Institution :
Dept. of Biosciences & Informatics, Keio Univ., Yokohama, Japan
Abstract :
Since the whole genome sequences for many species are currently available, computational predictions of RNA secondary structures and computational identifications of those non-coding RNA regions by comparative genomics have become important, and require more advanced alignment methods. Recently, an approach of structural alignments for RNA sequences has been introduced to solve these problems. By structural alignments, we mean a pair-wise alignment to align an unfolded RNA sequence into a folded RNA sequence of known secondary structure. Pair HMMs on tree structures (PHMMTSs) proposed by Sakakibara are efficient automata-theoretic models for structural alignments of RNA secondary structures, but are incapable of handling pseudoknots. On the other hand, tree adjoining grammars (TAGs) is a subclass of context-sensitive grammar, which is suitable for modeling pseudoknots. Our goal is to extend PHMMTSs by incorporating TAGs to be able to handle pseudoknots. We propose the pair stochastic tree adjoining grammars (PSTAGs) for modeling RNA secondary structures including pseudoknots and show the strong experimental evidences that modeling pseudoknot structures significantly improves the prediction accuracies of RNA secondary structures. First, we extend the notion of PHMMTSs defined on alignments of \´trees\´ to PSTAGs defined on alignments of "TAG (derivation) trees", which represent a top-down parsing process of TAGs and are functionally equivalent to derived trees of TAGs. Second, we modify PSTAGs so that it takes as input a pair of a linear sequence and a TAG tree representing a pseudoknot structure of RNA to produce a structural alignment. Then, we develop a polynomial-time algorithm for obtaining an optimal structural alignment by PSTAGs, based on dynamic programming parser. We have done several computational experiments for predicting pseudoknots by PSTAGs, and our computational experiments suggests that prediction of RNA pseudoknot structures by our method are more efficient and biologically plausible than by other conventional methods. The binary code for PSTAG method is freely available from our website at http://www.dna.bio.keio.ac.jp/pstag/.
Keywords :
biology computing; context-sensitive grammars; genetics; hidden Markov models; macromolecules; molecular biophysics; physiological models; stochastic automata; trees (mathematics); HMMs; RNA secondary structures; RNA sequences; automata-theoretic models; comparative genomics; context-sensitive grammar; noncoding RNA regions; pair stochastic tree adjoining grammars; polynomial-time algorithm; pseudoknot RNA structural alignments; top-down parsing process; whole genome sequences; Bioinformatics; Biology computing; Context modeling; Genomics; Hidden Markov models; Predictive models; RNA; Stochastic processes; Technical Activities Guide -TAG; Tree data structures;
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
DOI :
10.1109/CSB.2004.1332442