A Parallel Algorithm for Solving LCS of Multiple Bioseqences

Author

Liu, Wei ; Chen, Ling

Author_Institution

Dept. of Comput. Sci., Yangzhou Univ.

fYear

2006

fDate

13-16 Aug. 2006

Firstpage

4316

Lastpage

4321

Abstract

A fast algorithm named FAST_LCS is presented for the longest common substring (LCS) problem. The algorithm firstly seeks the successors of the initial identical character tuples according to successor tables to obtain all the identical tuples and their levels. Then the result of LCS can be obtained by tracing back from the identical character tuple with the largest level. For n biosequences X ₁, X₂, ..., X_n, the time complexity of sequentially execution of our algorithm is O(L). Here L is the number of identical character tuples. The time complexity of our parallel algorithm is O(|LCS|), where LCS is the length of the longest common substring of X₁, X₂, ... , X_n. This time complexity is independent of the number of sequences n. Some experiments have been conducted on the gene sequences of tigr database using MPP parallel computer Shenteng 1800. The results show that our algorithm is accurate and more efficient than other LCS algorithms

Keywords

biology computing; computational complexity; database management systems; genetics; parallel algorithms; sequences; string matching; Shenteng 1800 MPP parallel computer; gene sequences; longest common substring problem; multiple bioseqences; parallel algorithm; tigr database; time complexity; Bioinformatics; Biology computing; Computer science; Concurrent computing; Cybernetics; DNA; Databases; Machine learning; Parallel algorithms; Proteins; Sequences; Software algorithms; Bioinformatics; identical character tuple; longest common subsequence;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics, 2006 International Conference on

Conference_Location

Dalian, China

Print_ISBN

1-4244-0061-9

Type

conf

DOI

10.1109/ICMLC.2006.259020

Filename

4028832