Title :
Massively parallel genomic sequence search on the Blue Gene/P architecture
Author :
Lin, Heshan ; Balaji, Pavan ; Poole, Ruth ; Sosa, Carlos ; Ma, Xiaosong ; Feng, Wu-chun
Author_Institution :
Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
Abstract :
This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures.We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem - sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes - in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.
Keywords :
bioinformatics; genetics; parallel architectures; parallel machines; scientific information systems; Blue Gene/P architecture; genomic sequence search mapping; genomic sequence search optimization; large-scale bioinformatics problem; massive parallel genomic sequence search code; microbial genome database; supercomputing environment; Bioinformatics; Computer architecture; Computer science; Concurrent computing; Databases; Genomics; Laboratories; Mathematics; Permission; Sequences;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-2834-2
Electronic_ISBN :
978-1-4244-2835-9
DOI :
10.1109/SC.2008.5222005