Title :
Shared Genomics: Accessible High Performance Computing for Genomic Medical Research
Author :
Delderfield, Mark ; Kitching, Lee ; Smith, Gareth ; Hoyle, David ; Buchan, Iain
Author_Institution :
North West Inst. for BioHealth Inf., Univ. of Manchester, Manchester
Abstract :
The study of the genetic causes of disease is entering a new era. Variations in DNA sequence between individuals at a single position (locus) within the human genome are termed single nucleotide polymorphisms (SNPs), and may lead to a frank disease state or a variation in normal physiology. By comparing and contrasting the genomes of people who have a disease with the genomes of people who don´t, we can begin to identify those genetic locii which potentially play a role in the disease. Modern biotechnology allows for the genotyping of individuals at hundreds of thousands of genetic locii. Whilst metrics to quantify the statistical importance of a single locus are essentially of low complexity, for example calculation of a x2 statistic, within a genome-wide association study this process is repeated at every locus. In addition, the entire computational process is often repeated with a number of randomised data sets, necessary for estimation of the statistical significance. The large number of locii, number of randomized data sets, and rapid combinatorial increase when analysing multiple SNPs, naturally dictates that a high performance computing (HPC) solution be developed. On a single core machine analysis of significant numbers of SNP pairs would take many years. Once statistical analysis of the data has been performed results must be annotated with relevant information to aid biological interpretation and hypothesis generation - this is a standard, but not in substantial bioinformatic task.
Keywords :
DNA; biology computing; genomics; molecular biophysics; statistical analysis; DNA sequence; genomic medical research; randomized data sets; shared genomics; single nucleotide polymorphisms; statistical analysis; substantial bioinformatic task; Bioinformatics; Biotechnology; DNA; Diseases; Genetics; Genomics; High performance computing; Humans; Physiology; Sequences; HPC; SNPs; bioinformatics; genetics; genome-wide association;
Conference_Titel :
eScience, 2008. eScience '08. IEEE Fourth International Conference on
Conference_Location :
Indianapolis, IN
Print_ISBN :
978-1-4244-3380-3
Electronic_ISBN :
978-0-7695-3535-7
DOI :
10.1109/eScience.2008.132