DocumentCode :
2192823
Title :
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis
Author :
Masso, Majid
Author_Institution :
Dept. of Bioinf. & Comput. Biol., George Mason Univ., Manassas, VA, USA
fYear :
2010
fDate :
13-13 Dec. 2010
Firstpage :
719
Lastpage :
725
Abstract :
A computational mutagenesis is detailed whereby each single residue substitution in a protein chain of primary sequence length N is represented as a sparse N-dimensional feature vector, whose M ≪ N nonzero components locally quantify environmental perturbations occurring at the mutated position and its neighbors in the protein structure. The methodology makes use of both the Delaunay tessellation algorithm for representing protein structures, as well as a four-body, knowledge based, statistical contact potential. Feature vectors for each subset of mutants due to all possible residue substitutions at a particular position cohabit the same M-dimensional subspace, where the value of M and the identities of the M nonzero components are similarly position dependent. The approach is used to characterize a large experimental dataset of single residue substitutions in bacteriophage T4 lysozyme, each categorized as either unaffected or affected based on the measured level of mutant activity relative to that of the native protein. Performance of a single classifier trained with the collective set of mutants in N-space is compared to that of an ensemble of position-specific classifiers trained using disjoint mutant subsets residing in significantly smaller subspaces. The results, based on implementations of supervised classification algorithms, suggest that significant improvements can be achieved through subspace modeling.
Keywords :
biology computing; data mining; mesh generation; pattern classification; proteins; Delaunay tessellation algorithm; bacteriophage T4 lysozyme; computational protein mutagenesis; feature selection; primary sequence length; protein chain; sparse N-dimensional feature vector; statistical contact potential; statistical geometry; supervised classification algorithms; Delaunay tessellation; computational mutagenesis; feature selection; statistical potential; subspace modeling; supervised classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
Type :
conf
DOI :
10.1109/ICDMW.2010.35
Filename :
5693367
Link To Document :
بازگشت