• DocumentCode
    1694447
  • Title

    Enhancing Protein Domain Detection Using Domain Co-occurrence and Domain Exclusion

  • Author

    Ghouila, Amel ; Gascuel, Olivier ; Yahia, S.B. ; Bréhélin, Laurent

  • Author_Institution
    Methods & Algorithms for Bioinf. LIRMM, Univ. Montpellier 2, Montpellier, France
  • fYear
    2012
  • Firstpage
    223
  • Lastpage
    228
  • Abstract
    Among the relevant annotations that can be attributed to a protein, domains occupy a key position. Protein domains are sequential and structural motifs that are found independently in different proteins and in different combinations. One of the most widely used domain scheme is the Pfam database which is a collection of protein domain and families. Each family in Pfam is represented by a multiple sequence alignment and a Hidden Markov Model (HMM).When analyzing a new protein sequence, each Pfam HMM is used to compute a score measuring the similarity between the sequence and the domain. If the score is above a given threshold provided by Pfam, the presence of the domain can be asserted in the protein. However, when applied to proteins of organisms with high evolutionary distance from classical model organisms, this strategy may miss several domains. We recently proposed a method, the Co-Occurrence Domain Detection approach (CODD), that improves the sensitivity of Pfam domain detection by exploiting the tendency of domains to appear preferentially with a few other favorite domains in a protein. Here, we propose to integrate domain exclusion information to prune false positive domains that are in conflict with other domains of the protein. Applied to P. falciparum and L. major proteins, we show that this strategy allows to substantially reduce the proportion of false positives among the new domains predicted by CODD, while preserving as much as possible the sensitivity of the approach.
  • Keywords
    bioinformatics; hidden Markov models; proteins; CODD algorithm; L. major proteins; P. falciparum proteins; Pfam HMM database; Pfam domain detection sensitivity improvement; cooccurrence domain detection approach; domain exclusion information integration; evolutionary distance; false positive domains; hidden Markov model; information annotations; organism proteins; protein families; protein sequence alignment; score computation; sequential motifs; structural motifs; Bioinformatics; Databases; Hidden Markov models; Organisms; Proteins; Sensitivity; Co-occurrence; Domain prediction; HMM; domain exclusion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications (DEXA), 2012 23rd International Workshop on
  • Conference_Location
    Vienna
  • ISSN
    1529-4188
  • Print_ISBN
    978-1-4673-2621-6
  • Type

    conf

  • DOI
    10.1109/DEXA.2012.45
  • Filename
    6327430