• DocumentCode
    3533341
  • Title

    Protein function prediction using decision trees

  • Author

    Yedida, Venkata Rama Kumar Swamy ; Chan, Chien-Chung ; Duan, Zhong-Hui

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Akron, Akron, OH
  • fYear
    2008
  • fDate
    3-5 Nov. 2008
  • Firstpage
    193
  • Lastpage
    199
  • Abstract
    We present an automated protein function prediction system that is based on a set of homologous proteins and gene ontology categories. A novel measure based on a set of optimal local alignments is used to identify the homologues. The biological functions of the homologous proteins are characterized with gene ontology annotations. The protein function prediction is performed based on data mining models using decision trees. The tree models depict the interconnections between biological functional groups, which reflect, in certain degree, the underlying biological pathways. The models are trained and tested using the complete proteome of model organism yeast (Sacchyromyces cerevisiae). The results of this study demonstrate the variations of model accuracy and prediction accuracy from one functional group to another. The variations illustrate certain limitations of sequence similarity based protein function prediction methods. However, basic assumption of similar sequences resulting similar functions is still largely valid. The models developed outperform the methods that are solely depends on the annotations of homologous proteins, although the model is to be used as a preliminary tool for protein function prediction and the prediction results need to be verified through other means. The results show that the prediction accuracies for most of the functional groups are over 80%.
  • Keywords
    biology computing; data mining; decision trees; macromolecules; proteins; automated protein function prediction system; biological functional groups; biological pathways; data mining models; decision trees; gene ontology; homologous proteins; Accuracy; Biological system modeling; Data mining; Decision trees; Fungi; Ontologies; Organisms; Predictive models; Protein engineering; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomeidcine Workshops, 2008. BIBMW 2008. IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4244-2890-8
  • Type

    conf

  • DOI
    10.1109/BIBMW.2008.4686235
  • Filename
    4686235