Title :
Hybrid feature selection and peptide binding affinity prediction using an EDA based algorithm
Author :
Shelke, Kalpesh ; Jayaraman, Sundaresan ; Ghosh, Sudip ; Valadi, Jayaraman
Author_Institution :
Center for Modeling & Simulation, Univ. of Pune, Pune, India
Abstract :
Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. The process of drug discovery often involves the use of quantitative structure-activity relationship (QSAR) models to identify chemical structures that could have good inhibitory effects on specific targets and have low toxicity (non-specific activity). QSAR models are regression or classification models used in the chemical and biological sciences. Because of high dimensionality problems, a feature selection problem is imminent. In this study, we thus employ a hybrid Estimation of Distribution Algorithm (EDA) based filter-wrapper methodology to simultaneously extract informative feature subsets and build robust QSAR models. The performance of the algorithm was tested on the benchmark classification challenge datasets obtained from the CoePRa competition platform, developed in 2006. Our results clearly demonstrate the efficacy of a hybrid EDA filter-wrapper algorithm in comparison to the results reported earlier.
Keywords :
biology computing; feature extraction; filtering theory; genetic algorithms; genomics; pattern classification; proteins; regression analysis; CoePRa competition platform; EDA based algorithm; chemical structure identification; classification models; drug discovery process; estimation of distribution algorithm; feature selection problem; feature vectors; functional genomics; high dimensionality problems; hybrid EDA filter-wrapper algorithm; hybrid feature selection; informative feature subset extraction; inhibitory effects; peptide binding affinity prediction; probabilistic model building genetic algorithm; protein function prediction; protein sequences; quantitative structure-activity relationship models; regression models; robust QSAR models; toxicity; Classification algorithms; Feature extraction; Prediction algorithms; Probability distribution; Radio frequency; Support vector machines; Vectors; Estimation of Distribution Algorithms; Feature Selection; Protein Function Prediction; Weighted Feature Ranking;
Conference_Titel :
Evolutionary Computation (CEC), 2013 IEEE Congress on
Conference_Location :
Cancun
Print_ISBN :
978-1-4799-0453-2
Electronic_ISBN :
978-1-4799-0452-5
DOI :
10.1109/CEC.2013.6557854