Title :
Feature Selection for Data Driven Prediction of Protein Model Quality
Author :
Montuori, Alfonso ; Pugliese, Luisa ; Raimondo, Giovanni ; Pasero, Eros
Author_Institution :
Politecnico di Torino, Torino
Abstract :
Features selection to assess the accuracy of a protein three-dimensional model, when only the protein sequence is known, is a challenging task because it is not clear which features are most important and how they should best be combined. We present the results of an information theory-based approach to select an optimal subset of features for the prediction of protein model quality. The optimal subset of features was calculated by means of a backward selection procedure, starting from a set of structural features belonging to the following three categories: atomic interactions, solvent accessibility, and secondary structure. Three statistical-learning approaches were evaluated to predict the quality of a protein model starting from an optimum subset of features. The performances of a probabilistic classifier modeled by means of a kernel probability density estimation method (KPDE) were compared with those of a feed-forward artificial neural network (ANN) and a support vector machine (SVM).
Keywords :
biophysics; estimation theory; learning (artificial intelligence); pattern classification; probability; proteins; statistical analysis; atomic interaction; backward selection; data driven prediction; feature selection; kernel probability density estimation; probabilistic classifier; protein model quality; protein sequence; secondary structure; solvent accessibility; statistical-learning; Bioinformatics; Biological information theory; Biological system modeling; Genomics; Kernel; Predictive models; Protein sequence; Solvents; Support vector machine classification; Support vector machines;
Conference_Titel :
Neural Networks, 2006. IJCNN '06. International Joint Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9490-9
DOI :
10.1109/IJCNN.2006.247365