DocumentCode :
3542878
Title :
Fast and parallelized greedy forward selection of genetic variants in Genome-wide association studies
Author :
Okser, Sebastian ; Pahikkala, Tapio ; Airola, Antti ; Aittokallio, Tero ; Salakoski, Tapio
Author_Institution :
Turku Centre for Comput. Sci., Univ. of Turku, Turku, Finland
fYear :
2011
fDate :
4-6 Dec. 2011
Firstpage :
214
Lastpage :
217
Abstract :
We present the application of a regularized least-squares based algorithm, known as greedy RLS, to perform a wrapper-based feature selection on an entire genome-wide association dataset. Wrapper methods were previously thought to be computationally infeasible on these types of studies. The running time of the method grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. Moreover, we show how it can be further accelerated using parallel computation on multi-core processors. We tested the method on the Wellcome Trust Case Control Consortium´s (WTCCC) Type 2 Diabetes - UK National Blood Service dataset consisting of 3,382 subjects and 404,569 single nucleotide polymorphisms (SNPs). Our method is capable of high-speed feature selection, selecting the top 100 predictive SNPs in under five minutes on a high end desktop and outperforms typical filter approaches in terms of predictive performance.
Keywords :
biology computing; genetics; genomics; greedy algorithms; learning (artificial intelligence); least squares approximations; multiprocessing systems; UK National Blood Service dataset; Wellcome Trust Case Control Consortium Type 2 Diabetes; fast greedy forward selection; genetic variants; genome-wide association dataset; genome-wide association studies; greedy RLS; machine learning; multicore processors; parallel computation; parallelized greedy forward selection; regularized least-squares based algorithm; single nucleotide polymorphisms; wrapper-based feature selection; Bioinformatics; Diseases; Genomics; Machine learning; Predictive models; Training; GWAS; Machine learning; SNP; feature selection; genome-wide association study; regularized least squares;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genomic Signal Processing and Statistics (GENSIPS), 2011 IEEE International Workshop on
Conference_Location :
San Antonio, TX
ISSN :
2150-3001
Print_ISBN :
978-1-4673-0491-7
Electronic_ISBN :
2150-3001
Type :
conf
DOI :
10.1109/GENSiPS.2011.6169483
Filename :
6169483
Link To Document :
بازگشت