Title :
Feature Selection Approach for Quantitative Prediction of Transcriptional Activities
Author :
Anand, Ashish ; Fogei, G. ; Tang, E. Ke ; Suganthan, P.N.
Author_Institution :
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore
Abstract :
Protein-DNA interactions play a crucial role in transcriptional regulation and other biological processes. Quantitative predictive models of protein-DNA binding affinities can increase our understanding of molecular interaction and help validate putative transcription factor binding sites or other regulatory features. Such predictive models must take into account context-specific features associated with both DNA and proteins. Given the large complexity associated with such features, here we consider only the contextual features of DNA associated with binding affinity. Two types of features are considered in this paper: 1) features accounting for conformational and physico-chemical properties of nucleotide sequence and 2) another set of features accounting for conservation of evolutionary information in the form of position-specific weight matrices. A feature selection approach, named, leave-one-out sequential forward selection (LOOSFS), is presented. The feature selection method employs leave-one-out cross-validation error of the least square support vector machines (LS-SVM) to estimate the test error of quantitative prediction model. The method is used to identify important features possibly responsible for differences in transcriptional activities of 130 DNA sequences. These sequences were obtained by single base substitutions within promoter of the mouse beta-major globin gene. The selected features and predicted activity values correlate well with experimental results
Keywords :
DNA; biology computing; proteins; support vector machines; DNA sequences; biological process; feature selection; least square support vector machines; leave-one-out sequential forward selection; molecular interaction; protein-DNA binding affinities; protein-DNA interactions; quantitative prediction; transcriptional activities; Accuracy; Biological processes; Biological system modeling; Biology computing; DNA; Predictive models; Protein engineering; Sequences; Stochastic processes; Testing;
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on
Conference_Location :
Toronto, Ont.
Print_ISBN :
1-4244-0623-4
Electronic_ISBN :
1-4244-0624-2
DOI :
10.1109/CIBCB.2006.331012