Title :
An IDC-based algorithm for efficient homology filtration with guaranteed seriate coverage
Author :
Lee, Hsiao Ping ; Tsai, Yin Te ; Shih, Ching Hua ; Sheu, Tzu Fang ; Tang, Chuan Yi
Author_Institution :
Dept. of Comput. Sci., Nat. Tsing Hua Univ., Taiwan
Abstract :
The homology search within genomic databases is a fundamental and crucial work for biological knowledge discovery. With exponentially increasing sizes and accesses of databases, the filtration approach, which filters impossible homology candidates to reduce the time for homology verification, becomes more important in bioinformatics. Most of known gram-based filtration approaches, like QUASAR, in the literature have limited error tolerance and would conduct potentially higher false-positives. In this paper, we present an IDC-based lossless filtration algorithm with guaranteed seriate coverage and error tolerance for efficient homology discovery. In our method, the original work of homology extraction with requested seriate coverage and error levels is transformed to a longest increasing subsequence problem with range constraints, and an efficient algorithm is proposed for the problem in this paper. The experimental results show that the method significantly outperforms QUASAR. On some comparable sensitivity levels, our homology filter would make the discovery more than three orders of magnitude faster than that QUASAR does, and more than four orders faster than the exhaustive search.
Keywords :
biology computing; data mining; database indexing; genetics; molecular biophysics; proteins; IDC-based algorithm; biological knowledge discovery; error levels; genomic databases; homology extraction; homology filtration; seriate coverage; Bioinformatics; Computer science; DNA; Databases; Evolution (biology); Filters; Filtration; Genomics; Proteins; Sequences;
Conference_Titel :
Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on
Print_ISBN :
0-7695-2173-8
DOI :
10.1109/BIBE.2004.1317370