DocumentCode :
451158
Title :
Improved Parallel and Sequential Walking Tree Methods for Biological String Alignments
Author :
Cull, Paul ; Hsu, Tai
Author_Institution :
Oregon State University
fYear :
1999
fDate :
13-18 Nov. 1999
Firstpage :
51
Lastpage :
51
Abstract :
Approximate string matching is commonly used to align genetic sequences (DNA or RNA) to determine their shared characteristics. Most genetic string matching methods are based on the edit-distance model, which does not provide alignments for inversions and translocations. Recently, a heuristic called the Walking Tree Method [2, 3] has been developed to solve this problem. Unlike other heuristics, it can handle more than one level of inversion, i.e., inversions within inversions. Furthermore, it tends to capture the matched strings´ genes while other heuristics fail. There are two versions of the original walking tree heuristics: the score version gives only the alignment score, the alignment version gives both the score and the alignment mapping between the strings. The score version runs in quadratic time and uses linear space while the alignment version uses an extra log factor for time and space. In this paper, we will briefly describe the walking tree method and the original sequential and parallel algorithms. We will explain why different parallel algorithms are needed for a network of workstations rather than the original algorithm which worked well on a symmetric multi-processor. Our improved parallel method also led to a quadratic time sequential algorithm that uses less space. We give an example of our parallel method. We describe several experiments that show speedup linear in the number of processors, but eventual drop off in speedup as the communication network saturates. For big enough strings, we found linear speedup for all processors we had available. These results suggest that our improved parallel method will scale up as both the size of the problem and the number of processors increase. We include two figures that use real biological data and show that the walking tree methods can find translocations and inversions in DNA sequences and also discover unknown genes.
Keywords :
dynamic programming; genome alignment; heuristic; inversions; parallel algorithm; sequence alignment; translocations; walking tree; Biological system modeling; Computer science; DNA; Genetics; Legged locomotion; Parallel algorithms; Polynomials; RNA; Sequences; Workstations; dynamic programming; genome alignment; heuristic; inversions; parallel algorithm; sequence alignment; translocations; walking tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Supercomputing, ACM/IEEE 1999 Conference
Print_ISBN :
1-58113-091-0
Type :
conf
DOI :
10.1109/SC.1999.10003
Filename :
1592694
Link To Document :
بازگشت