Title :
Variable selection in statistical models using population-based incremental learning with applications to genome-wide association studies
Author :
Nguyen, Hien Duy ; Wood, Ian A.
Author_Institution :
Sch. of Math. & Phys., Univ. of Queensland, St. Lucia, QLD, Australia
Abstract :
Variable selection is the problem of choosing the subset of explanatory variables for a regression or classification model such that the resulting model is best according to some criterion. Here we consider the use of population-based incremental learning (PBIL) to select the variables for a linear regression model to predict a quantitative trait in living organisms. The data here is simulated to represent a genome-wide association study (GWAS) using single nucleotide polymorphisms (SNPs) as explanatory variables and height as an example trait. PBIL was effective in optimizing a variety of model fitness criteria. The resulting models were found to have true positive and false negative rates comparable to those of competing methods.
Keywords :
bioinformatics; data mining; learning (artificial intelligence); pattern classification; regression analysis; GWAS; PBIL; SNP; bioinformatics; classification model; data mining; genome-wide association studies; linear regression model; living organisms; machine learning; model fitness criteria; population-based incremental learning; single nucleotide polymorphisms; statistical models; variable selection; Accuracy; Biological system modeling; Input variables; Linear regression; Predictive models; Prototypes; Vectors; GWAS; PBIL; linear regression; variable selection;
Conference_Titel :
Evolutionary Computation (CEC), 2012 IEEE Congress on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-1510-4
Electronic_ISBN :
978-1-4673-1508-1
DOI :
10.1109/CEC.2012.6256577