DocumentCode :
2082297
Title :
Supercomputing enabling exhaustive statistical analysis of genome wide association study data: Preliminary results
Author :
Reumann, M. ; Makalic, E. ; Goudey, B.W. ; Inouye, M. ; Bickerstaffe, A. ; Bui, M. ; Park, D.J. ; Kapuscinski, M.K. ; Schmidt, D.F. ; Zhou, Zhengchun ; Qian, G. ; Zobel, Justin ; Wagner, Jens ; Hopper, J.L.
Author_Institution :
IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia
fYear :
2012
fDate :
Aug. 28 2012-Sept. 1 2012
Firstpage :
1258
Lastpage :
1261
Abstract :
Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world´s accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics.
Keywords :
Bayes methods; biology computing; distributed processing; genomics; statistical analysis; Bayesian posterior probability test; GWAS; aetiology; biology; contingency tables; environmental risk; epidemiology; genetic risk; genome wide association study data; independence testing; pathology data; single nucleotide polymorphism; statistical analysis; supercomputing resources; survival data; Bayesian methods; Bioinformatics; Genomics; Runtime; Bayes Theorem; Computational Biology; Computer Simulation; Genome-Wide Association Study; Humans; Monte Carlo Method; Neoplasms; Phenotype; Polymorphism, Single Nucleotide;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE
Conference_Location :
San Diego, CA
ISSN :
1557-170X
Print_ISBN :
978-1-4244-4119-8
Electronic_ISBN :
1557-170X
Type :
conf
DOI :
10.1109/EMBC.2012.6346166
Filename :
6346166
Link To Document :
بازگشت