Title :
Distributed Sequence Alignment Applications for the Public Computing Architecture
Author :
Pellicer, S. ; Guihai Chen ; Chan, K.C.C. ; Yi Pan
Author_Institution :
Georgia State Univ., Atlanta
fDate :
3/1/2008 12:00:00 AM
Abstract :
The public computer architecture shows promise as a platform for solving fundamental problems in bioinformatics such as global gene sequence alignment and data mining with tools such as the basic local alignment search tool (BLAST). Our implementation of these two problems on the Berkeley open infrastructure for network computing (BOINC) platform demonstrates a runtime reduction factor of 1.15 for sequence alignment and 16.76 for BLAST. While the runtime reduction factor of the global gene sequence alignment application is modest, this value is based on a theoretical sequential runtime extrapolated from the calculation of a smaller problem. Because this runtime is extrapolated from running the calculation in memory, the theoretical sequential runtime would require 37.3 GB of memory on a single system. With this in mind, the BOINC implementation not only offers the reduced runtime, but also the aggregation of the available memory of all participant nodes. If an actual sequential run of the problem were compared, a more drastic reduction in the runtime would be seen due to an additional secondary storage I/O overhead for a practical system. Despite the limitations of the public computer architecture, most notably in communication overhead, it represents a practical platform for grid- and cluster-scale bioinformatics computations today and shows great potential for future implementations.
Keywords :
biology computing; data mining; extrapolation; genetics; BLAST; BOINC platform; Berkeley open infrastructure-for-network computing; basic local alignment search tool; bioinformatics; data mining; distributed sequence alignment applications; global gene sequence alignment; public computing architecture; runtime reduction factor; secondary storage I/O overhead; sequential runtime extrapolation; Application software; Bioinformatics; Computer architecture; Computer networks; Computer science; Concurrent computing; Data mining; Distributed computing; Grid computing; Runtime; Basic local alignment search tool (BLAST); Berkeley Open infrastructure for network computing (BOINC); gene sequence alignment; public computer; Algorithms; Database Management Systems; Databases, Factual; Internet; Sequence Alignment; Sequence Analysis; Software;
Journal_Title :
NanoBioscience, IEEE Transactions on
DOI :
10.1109/TNB.2008.2000148