Title :
UPC++ for bioinformatics: A case study using genome-wide association studies
Author :
Kassens, Jan C. ; Gonzalez-Dominguez, Jorge ; Wienbrandt, Lars ; Schmidt, Benedikt
Author_Institution :
Dept. of Comput. Sci., Christian-Albrechts-Univ. of Kiel, Kiel, Germany
Abstract :
Modern genotyping technologies are able to obtain up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important but time-consuming operation since statistical computations have to be performed for each pair of measured markers. Therefore, a variety of HPC architectures have been used to accelerate these studies. In this work we present a parallel approach for multi-core clusters, which is implemented with UPC++ and takes advantage of the features available in the Partitioned Global Address Space and Object Oriented Programming models. Our solution is based on a well-known regression model (used by the popular BOOST tool) to test SNP-pairs interactions. Experimental results show that UPC++ is suitable for parallelizing data-intensive bioinformatics applications on clusters. For instance, it reduces the time to analyze a real-world dataset with more than 500,000 SNPs and 5,000 individuals from several days when using a single core to less than one minute using 512 nodes (12,288 cores) of a Cray XC30 supercomputer.
Keywords :
C++ language; Cray computers; bioinformatics; genetics; genomics; multiprocessing systems; object-oriented programming; parallel architectures; regression analysis; BOOST tool; Cray XC30 supercomputer; HPC architectures; SNP-SNP interactions; SNP-pairs interactions; UPC++; data-intensive bioinformatics applications; epistasis detection; genetic markers; genome-wide association studies; modern genotyping technologies; multicore clusters; object oriented programming models; parallel approach; partitioned global address space; real-world dataset; regression model; Bioinformatics; Computational modeling; Diseases; Electronics packaging; Genetics; Object oriented modeling; Optimization; Bioinformatics; GWAS; PGAS; UPC++;
Conference_Titel :
Cluster Computing (CLUSTER), 2014 IEEE International Conference on
Conference_Location :
Madrid
DOI :
10.1109/CLUSTER.2014.6968770