DocumentCode
2971018
Title
Impact of the Predicted Protein Structural Content on Prediction of Structural Classes for the Twilight Zone Proteins
Author
Kurgan, Lukasz A. ; Rahbari, Mandana ; Homaeian, Leila
Author_Institution
Dept. of Electr. & Comput. Eng., Alberta Univ., Edmonton, Alta.
fYear
2006
fDate
Dec. 2006
Firstpage
180
Lastpage
186
Abstract
This paper addresses in silico prediction of protein structural classes as defined in the SCOP database. The SCOP defines total of 11 classes, while majority of proteins are classified to the 4 classes: all-alpha all-beta alpha/beta, and alpha+beta. The main goals of this paper are to experimentally evaluate the impact of predicted protein secondary structure content on the structural class prediction and to develop a novel protein sequence representation. The experiments include application of three protein sequence representations and four classifiers to prediction of both 4 and 11 structural classes. The predictions are performed using a large dataset of low homology (twilight zone) sequences. The proposed sequence representation includes the predicted structural content, which provides the strongest contribution towards classification, composition and composition moment vectors, hydrophobic autocorrelations, chemical group composition and molecular weight of the protein. The predicted content values are shown on average to improve the prediction accuracy by 3.3% and 4.2% for the 4 and 11 classes, respectively, when compared to sequence representation that does not utilize this information. Finally, we propose a very compact, 20 dimensional sequence representation that is shown to improve the prediction accuracy by 5.1-8.5% when compared with recently published results
Keywords
biology computing; molecular biophysics; molecular configurations; pattern classification; proteins; SCOP database; chemical group composition; composition moment vector; homology sequence; hydrophobic autocorrelation; molecular weight; protein structural content; silico prediction; twilight zone protein; Accuracy; Amino acids; Autocorrelation; Chemicals; Clustering algorithms; Databases; Prediction methods; Protein engineering; Protein sequence; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications, 2006. ICMLA '06. 5th International Conference on
Conference_Location
Orlando, FL
Print_ISBN
0-7695-2735-3
Type
conf
DOI
10.1109/ICMLA.2006.27
Filename
4041489
Link To Document