DocumentCode
2192182
Title
Fast filter-and-refine algorithms for subsequence selection
Author
Ooi, Beng Chin ; Pang, Hwee Hwa ; Wang, Hao ; Wong, Limsoon ; Yu, Cui
Author_Institution
Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore
fYear
2002
fDate
2002
Firstpage
243
Lastpage
254
Abstract
Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper proposes a filter-and-refine algorithm that enables efficient approximate subsequence matching in large DNA sequence databases. It employs a bitmap indexing structure to condense and encode each data sequence into a shorter index sequence. During query processing, the bitmap index is used to filter out most of the irrelevant subsequences, and false positives are removed in the final refinement step. Analytical and experimental studies show that the proposed strategy is capable of reducing response time substantially while incurring only a small space overhead.
Keywords
DNA; biology computing; database indexing; molecular biophysics; query processing; scientific information systems; sequences; string matching; very large databases; DNA sequences; approximate subsequence matching; biology; bitmap indexing structure; data sequence condensing; data sequence encoding; false positive removal; fast filter-and-refine algorithms; gene sequences; index sequence; large DNA sequence databases; large sequence databases; protein sequences; query processing; query string; response time; small space overhead; subsequence selection; Computer science; DNA; Database systems; Delay; Indexing; Information technology; Matched filters; Proteins; Query processing; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Database Engineering and Applications Symposium, 2002. Proceedings. International
ISSN
1098-8068
Print_ISBN
0-7695-1638-6
Type
conf
DOI
10.1109/IDEAS.2002.1029677
Filename
1029677
Link To Document