DocumentCode :
1244395
Title :
Aligning biological sequences on distributed bus networks: a divisible load scheduling approach
Author :
Min, Wong Han ; Veeravalli, Bharadwaj
Author_Institution :
Data Stage Inst., Agency for Sci. Technol. & Res., Singapore
Volume :
9
Issue :
4
fYear :
2005
Firstpage :
489
Lastpage :
501
Abstract :
In this paper, we design a multiprocessor strategy that exploits the computational characteristics of the algorithms used for biological sequence comparison proposed in the literature. We employ divisible load theory (DLT) that is suitable for handling large scale processing on network based systems. For the first time in the domain of DLT, the problem of aligning biological sequences is attempted. The objective is to minimize the total processing time of the alignment process. In designing our strategy, DLT facilitates a clever partitioning of the entire computation process involved in such a way that the overall time consumed for aligning the sequences is a minimum. The partitioning takes into account the computation speeds of the nodes and the underlying communication network. Since this is a real-life application, the post-processing phase becomes important, and hence we consider propagating the results back in order to generate an exact alignment. We consider several cases in our analysis such as deriving closed-form solutions for the processing time for heterogeneous, homogeneous, and networks with slow links. Further, we attempt to employ a multiinstallment strategy to distribute the tasks such that a higher degree of parallelism can be achieved. For slow networks, our strategy recommends near-optimal solutions. We derive an important condition to identify such cases and propose two heuristic strategies. Also, our strategy can be extended for multisequence alignment by utilizing a clustering strategy such as the Berger-Munson algorithm proposed in the literature. Finally, we use real-life DNA samples of house mouse mitochondrion (Mus Musculus Mitochondrion, NC.001569) consisting of 16 295 residues and the DNA of human mitochondrion (Homo Sapiens Mitochondrion, NC.001807) consisting of 16 571 residues, obtainable from the GenBank , in our rigorous simulation experiments to illustrate all the theoretical findings.
Keywords :
DNA; biology computing; cellular biophysics; molecular biophysics; pattern clustering; processor scheduling; Berger-Munson algorithm; Smith-Waterman algorithm; biological sequence aligning; closed-form solutions; clustering strategy; communication time; computational characteristics; degree of parallelism; distributed bus network; divisible load scheduling approach; divisible load theory; heuristic strategy; house mouse mitochondrion; human mitochondrion; large scale processing handling; multiprocessor strategy design; real-life DNA samples; slow networks; Algorithm design and analysis; Biology computing; Closed-form solution; Communication networks; Computer networks; DNA; Large-scale systems; Process design; Processor scheduling; Sequences; Biological sequences; Smith–Waterman algorithm; communication time; divisible load theory(DLT); sequence alignment; Algorithms; Animals; Computer Communication Networks; Computing Methodologies; DNA; Mice; Sequence Alignment; Sequence Analysis; Signal Processing, Computer-Assisted;
fLanguage :
English
Journal_Title :
Information Technology in Biomedicine, IEEE Transactions on
Publisher :
ieee
ISSN :
1089-7771
Type :
jour
DOI :
10.1109/TITB.2005.855559
Filename :
1545953
Link To Document :
بازگشت