DocumentCode :
2452452
Title :
Beyond Identity- When Classical Homology Searching Fails, Why, and What you Can do About It
Author :
Ray, William C. ; Ozer, Hatice G. ; Armbruster, David W. ; Daniels, Charles J.
Author_Institution :
Battelle Center for Math. Med., Nationwide Children´´s Hosp., Columbus, OH, USA
fYear :
2009
fDate :
15-17 June 2009
Firstpage :
51
Lastpage :
56
Abstract :
Multiple Sequence Alignments of both protein and nucleic-acid sequences are a ubiquitous method for modeling sequence families that pervades every biological domain. Despite their utility, MSAs and methods derived from them fail to capture interpositional relationships that can be as critical to family membership as are positional identities.We have recently developed novel methods, MAVL and StickWRLD, to quantitate and visualize additional features of sequence family models, and have identified interpositional dependencies at the residue level that are critical indicators of family membership in many sequence families. Some of these dependencies cannot be modeled by any existing modeling method, including Hidden Markov Models. In certain cases, the dependencies are sufficiently strong that all common methods score sequences that are explicitly excluded from the family, as better candidates than any actual members.The tRNA intron-endonuclease targets in the Archaea are such a family. Originally characterized as excised introns from archaeal tRNAs, some of which function as guide RNAs to target O-methylation of the ribosomal RNAs, these sequences have a very short characteristic signature and allow significant divergence. There is insufficient information in the base conservation to create useful scoring models. Using our tools we have identified critical residue interdependencies within the endonuclease target that enable detection of introns in whole-genomic sequence. Many of these introns occur outside tRNAs, including some that are excised from protein mRNA. The dependencies we identify correspond to a Markov network of relationships over the positional identities. The contribution of each nodepsilas Markov blanket is incorporated via blending with the positional conservation using a voting algorithm. In this paper we present the results of this analysis and the generalization of our modeling method to arbitrary RNA families. This generalization allows developmen- t of models of similar power for arbitrary RNA families.
Keywords :
Markov processes; biology computing; genomics; molecular biophysics; proteins; MAVL; Markov network; O-methylation; StickWRLD; genomic sequence; homology searching; multiple sequence alignments; nucleic-acid sequences; protein sequences; tRNA intron-endonuclease; voting algorithm; Bioinformatics; Biological system modeling; Biomedical informatics; Collaboration; Hidden Markov models; Hospitals; Pediatrics; Proteins; RNA; Visualization; alignments; homology; modeling; searching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics, 2009. OCCBIO '09. Ohio Collaborative Conference on
Conference_Location :
Cleveland, OH
Print_ISBN :
978-0-7695-3685-9
Type :
conf
DOI :
10.1109/OCCBIO.2009.23
Filename :
5159160
Link To Document :
بازگشت