Title :
Protein model assessment using extented fuzzy decision tree with spatial neighborhood features
Author :
Chida, Anjum ; Harrison, Robert ; Zhang, Yan-Qing
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
Abstract :
Automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics We attempt to solve this problem using machine learning technique and information from both sequence and structure of the protein. Information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Improved fuzzy decision tree algorithm is used to build the machine, for the rules generated and high prediction accuracy it is favored over other machine learning techniques. Different subsets of PDB are considered for evaluating the prediction potential of the machine learning methods. With the use of machine learning technique, fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment.
Keywords :
bioinformatics; biological techniques; decision trees; fuzzy reasoning; learning (artificial intelligence); molecular biophysics; proteins; 3D protein structures; alpha carbon atom distance; amino acid sequence; amino acid substitution matrix; automatic prediction; bioinformatics; extented fuzzy decision tree; machine learning technique; polarity; protein model assessment; protein sequence; relative distance; secondary structure information; spatial neighborhood features; training accuracy; training vector; vector formation; Accuracy; Amino acids; Carbon; Decision trees; Encoding; Proteins; Vectors; decision tree; feature selection; fuzzy ID3; machine learning; protein 3D structures; protein model assessment;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2012 IEEE Symposium on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-1190-8
DOI :
10.1109/CIBCB.2012.6217211