• DocumentCode
    2452452
  • Title

    Beyond Identity- When Classical Homology Searching Fails, Why, and What you Can do About It

  • Author

    Ray, William C. ; Ozer, Hatice G. ; Armbruster, David W. ; Daniels, Charles J.

  • Author_Institution
    Battelle Center for Math. Med., Nationwide Children´´s Hosp., Columbus, OH, USA
  • fYear
    2009
  • fDate
    15-17 June 2009
  • Firstpage
    51
  • Lastpage
    56
  • Abstract
    Multiple Sequence Alignments of both protein and nucleic-acid sequences are a ubiquitous method for modeling sequence families that pervades every biological domain. Despite their utility, MSAs and methods derived from them fail to capture interpositional relationships that can be as critical to family membership as are positional identities.We have recently developed novel methods, MAVL and StickWRLD, to quantitate and visualize additional features of sequence family models, and have identified interpositional dependencies at the residue level that are critical indicators of family membership in many sequence families. Some of these dependencies cannot be modeled by any existing modeling method, including Hidden Markov Models. In certain cases, the dependencies are sufficiently strong that all common methods score sequences that are explicitly excluded from the family, as better candidates than any actual members.The tRNA intron-endonuclease targets in the Archaea are such a family. Originally characterized as excised introns from archaeal tRNAs, some of which function as guide RNAs to target O-methylation of the ribosomal RNAs, these sequences have a very short characteristic signature and allow significant divergence. There is insufficient information in the base conservation to create useful scoring models. Using our tools we have identified critical residue interdependencies within the endonuclease target that enable detection of introns in whole-genomic sequence. Many of these introns occur outside tRNAs, including some that are excised from protein mRNA. The dependencies we identify correspond to a Markov network of relationships over the positional identities. The contribution of each nodepsilas Markov blanket is incorporated via blending with the positional conservation using a voting algorithm. In this paper we present the results of this analysis and the generalization of our modeling method to arbitrary RNA families. This generalization allows developmen- t of models of similar power for arbitrary RNA families.
  • Keywords
    Markov processes; biology computing; genomics; molecular biophysics; proteins; MAVL; Markov network; O-methylation; StickWRLD; genomic sequence; homology searching; multiple sequence alignments; nucleic-acid sequences; protein sequences; tRNA intron-endonuclease; voting algorithm; Bioinformatics; Biological system modeling; Biomedical informatics; Collaboration; Hidden Markov models; Hospitals; Pediatrics; Proteins; RNA; Visualization; alignments; homology; modeling; searching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics, 2009. OCCBIO '09. Ohio Collaborative Conference on
  • Conference_Location
    Cleveland, OH
  • Print_ISBN
    978-0-7695-3685-9
  • Type

    conf

  • DOI
    10.1109/OCCBIO.2009.23
  • Filename
    5159160