• DocumentCode
    1378924
  • Title

    Discriminative Motif Finding for Predicting Protein Subcellular Localization

  • Author

    Lin, Tien-ho ; Murphy, Robert F. ; Bar-Joseph, Ziv

  • Author_Institution
    Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • Volume
    8
  • Issue
    2
  • fYear
    2011
  • Firstpage
    441
  • Lastpage
    451
  • Abstract
    Many methods have been described to predict the subcellular location of proteins from sequence information. However, most of these methods either rely on global sequence properties or use a set of known protein targeting motifs to predict protein localization. Here, we develop and test a novel method that identifies potential targeting motifs using a discriminative approach based on hidden Markov models (discriminative HMMs). These models search for motifs that are present in a compartment but absent in other, nearby, compartments by utilizing an hierarchical structure that mimics the protein sorting mechanism. We show that both discriminative motif finding and the hierarchical structure improve localization prediction on a benchmark data set of yeast proteins. The motifs identified can be mapped to known targeting motifs and they are more conserved than the average protein sequence. Using our motif-based predictions, we can identify potential annotation errors in public databases for the location of some of the proteins. A software implementation and the data set described in this paper are available from http://murphylab.web.cmu.edu/software/2009_TCBB_motif/.
  • Keywords
    bioinformatics; cellular biophysics; hidden Markov models; physiological models; proteins; proteomics; discriminative motif finding; hidden Markov models; protein sequence; protein sorting; protein subcellular localization; protein targeting; sequence information; Bioinformatics; Biomedical engineering; Cells (biology); Computational biology; Hidden Markov models; Machine learning; Proteins; Sequences; Terrorism; USA Councils; Hidden Markov models; discriminative motif finding; maximal mutual information estimate; protein localization.; Algorithms; Amino Acid Motifs; Computational Biology; Databases, Protein; Fungal Proteins; Markov Chains; Protein Sorting Signals; Proteins; Sequence Alignment; Sequence Analysis, Protein;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2009.82
  • Filename
    5374366