DocumentCode
3533341
Title
Protein function prediction using decision trees
Author
Yedida, Venkata Rama Kumar Swamy ; Chan, Chien-Chung ; Duan, Zhong-Hui
Author_Institution
Dept. of Comput. Sci., Univ. of Akron, Akron, OH
fYear
2008
fDate
3-5 Nov. 2008
Firstpage
193
Lastpage
199
Abstract
We present an automated protein function prediction system that is based on a set of homologous proteins and gene ontology categories. A novel measure based on a set of optimal local alignments is used to identify the homologues. The biological functions of the homologous proteins are characterized with gene ontology annotations. The protein function prediction is performed based on data mining models using decision trees. The tree models depict the interconnections between biological functional groups, which reflect, in certain degree, the underlying biological pathways. The models are trained and tested using the complete proteome of model organism yeast (Sacchyromyces cerevisiae). The results of this study demonstrate the variations of model accuracy and prediction accuracy from one functional group to another. The variations illustrate certain limitations of sequence similarity based protein function prediction methods. However, basic assumption of similar sequences resulting similar functions is still largely valid. The models developed outperform the methods that are solely depends on the annotations of homologous proteins, although the model is to be used as a preliminary tool for protein function prediction and the prediction results need to be verified through other means. The results show that the prediction accuracies for most of the functional groups are over 80%.
Keywords
biology computing; data mining; decision trees; macromolecules; proteins; automated protein function prediction system; biological functional groups; biological pathways; data mining models; decision trees; gene ontology; homologous proteins; Accuracy; Biological system modeling; Data mining; Decision trees; Fungi; Ontologies; Organisms; Predictive models; Protein engineering; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomeidcine Workshops, 2008. BIBMW 2008. IEEE International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
978-1-4244-2890-8
Type
conf
DOI
10.1109/BIBMW.2008.4686235
Filename
4686235
Link To Document