Title :
Protein function prediction using decision trees
Author :
Yedida, Venkata Rama Kumar Swamy ; Chan, Chien-Chung ; Duan, Zhong-Hui
Author_Institution :
Dept. of Comput. Sci., Univ. of Akron, Akron, OH
Abstract :
We present an automated protein function prediction system that is based on a set of homologous proteins and gene ontology categories. A novel measure based on a set of optimal local alignments is used to identify the homologues. The biological functions of the homologous proteins are characterized with gene ontology annotations. The protein function prediction is performed based on data mining models using decision trees. The tree models depict the interconnections between biological functional groups, which reflect, in certain degree, the underlying biological pathways. The models are trained and tested using the complete proteome of model organism yeast (Sacchyromyces cerevisiae). The results of this study demonstrate the variations of model accuracy and prediction accuracy from one functional group to another. The variations illustrate certain limitations of sequence similarity based protein function prediction methods. However, basic assumption of similar sequences resulting similar functions is still largely valid. The models developed outperform the methods that are solely depends on the annotations of homologous proteins, although the model is to be used as a preliminary tool for protein function prediction and the prediction results need to be verified through other means. The results show that the prediction accuracies for most of the functional groups are over 80%.
Keywords :
biology computing; data mining; decision trees; macromolecules; proteins; automated protein function prediction system; biological functional groups; biological pathways; data mining models; decision trees; gene ontology; homologous proteins; Accuracy; Biological system modeling; Data mining; Decision trees; Fungi; Ontologies; Organisms; Predictive models; Protein engineering; Testing;
Conference_Titel :
Bioinformatics and Biomeidcine Workshops, 2008. BIBMW 2008. IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4244-2890-8
DOI :
10.1109/BIBMW.2008.4686235