Title :
An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System
Author :
Jiang, Karl ; Thorsen, Oystein ; Peters, Amanda ; Smith, Brian ; Sosa, Carlos P.
Author_Institution :
IBM, Rochester
Abstract :
Bioinformatics databases used for sequence comparison and sequence alignment are growing exponentially. This has popularized programs that carry out database searches. Current implementations of sequence alignment methods based on hidden Markov models (HMM) have proven to be computationally intensive and, hence, amenable to architectures with multiple processors. In this paper, we describe a modified version of the original parallel implementation of HMMs on a massively parallel system. This is part of the HMMER bioinformatics code. HMMER 2.3.2 uses profile HMMs for sensitive database searching based on statistical descriptions of a sequence family´s consensus (Durbin et al., 1998), Two of the nine programs were further parallelized to take advantage of the large number of processors, namely, hmmsearch and hmmpfam. For our study, we start by porting the parallel virtual machine (PVM) versions of these two programs currently available as part of the HMMER suite of programs. We report the performance of these nonoptimized versions as baselines. Our work also includes the introduction of an alternate sequence file indexing, multiple-master configuration, dynamic data collection and, finally, load balancing via the indexed sequence files. This set of optimizations constitutes our modified version for massively parallel systems. Our results show parallel performance improvements of more than one order of magnitude (16 times) for hmmsearch and hmmpfam.
Keywords :
biology computing; database indexing; genetics; hidden Markov models; resource allocation; virtual machines; HMMER 2.3.2; HMMER bioinformatics code; alternate sequence file indexing; bioinformatics databases; database searches; dynamic data collection; genomic sequence search; hidden Markov models; hmmpfam; hmmsearch; load balancing; massively parallel systems; multiple processors; multiple-master configuration; nonoptimized versions; parallel virtual machine; sensitive database searching; sequence comparison; HMMER; Hidden Markov models; bioinformatics.; genomic sequence-search; massively parallel systems; multiple master parallelization; parallel implementation;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2007.70712