DocumentCode :
2971018
Title :
Impact of the Predicted Protein Structural Content on Prediction of Structural Classes for the Twilight Zone Proteins
Author :
Kurgan, Lukasz A. ; Rahbari, Mandana ; Homaeian, Leila
Author_Institution :
Dept. of Electr. & Comput. Eng., Alberta Univ., Edmonton, Alta.
fYear :
2006
fDate :
Dec. 2006
Firstpage :
180
Lastpage :
186
Abstract :
This paper addresses in silico prediction of protein structural classes as defined in the SCOP database. The SCOP defines total of 11 classes, while majority of proteins are classified to the 4 classes: all-alpha all-beta alpha/beta, and alpha+beta. The main goals of this paper are to experimentally evaluate the impact of predicted protein secondary structure content on the structural class prediction and to develop a novel protein sequence representation. The experiments include application of three protein sequence representations and four classifiers to prediction of both 4 and 11 structural classes. The predictions are performed using a large dataset of low homology (twilight zone) sequences. The proposed sequence representation includes the predicted structural content, which provides the strongest contribution towards classification, composition and composition moment vectors, hydrophobic autocorrelations, chemical group composition and molecular weight of the protein. The predicted content values are shown on average to improve the prediction accuracy by 3.3% and 4.2% for the 4 and 11 classes, respectively, when compared to sequence representation that does not utilize this information. Finally, we propose a very compact, 20 dimensional sequence representation that is shown to improve the prediction accuracy by 5.1-8.5% when compared with recently published results
Keywords :
biology computing; molecular biophysics; molecular configurations; pattern classification; proteins; SCOP database; chemical group composition; composition moment vector; homology sequence; hydrophobic autocorrelation; molecular weight; protein structural content; silico prediction; twilight zone protein; Accuracy; Amino acids; Autocorrelation; Chemicals; Clustering algorithms; Databases; Prediction methods; Protein engineering; Protein sequence; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2006. ICMLA '06. 5th International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
0-7695-2735-3
Type :
conf
DOI :
10.1109/ICMLA.2006.27
Filename :
4041489
Link To Document :
بازگشت