DocumentCode :
2882932
Title :
Searching Genomic Databases using the Prime Factor Filter
Author :
Pears, Russel ; Ee, Jimmy
Author_Institution :
Auckland Univ. of Technol., Auckland
fYear :
2006
fDate :
15-17 Dec. 2006
Firstpage :
301
Lastpage :
306
Abstract :
The major bottleneck in searching genomic databases is the sheer size of the databases involved. A number of different solutions to the problem of aligning query sequences to genomic databases have been proposed, including the widely used BLAST and FASTA systems. While such systems are effective against traditional applications such as query alignment, they do not scale well for applications such as whole genome shotgun sequencing and all versus all comparisons of one organism against another. The latter application has quadratic time complexity in the size of the databases involved and requires a different approach to BLAST type search engines that rely on a linear scan of the database. Our approach relies on a two-stage filter to prune a significant fraction of the database prior to alignment. The filter uses the MRS index[8] as the first stage followed by a novel indexing scheme that we propose in this paper. The MRS index screens sequences that map to the same frequency vector and has been shown to produce speedups of up to 12 over systems that do not employ such an index. However, the MRS index is inadequate against sequences that are inherently different while still mapping to the same frequency vector. Our filter, based on the prime factor Indexing scheme is successful in eliminating a large fraction of such false positives that survive the MRS index. Our experiments show that at least 75% of the false positives is eliminated, resulting in speedups of up to 5 times over the MRS indexing scheme.
Keywords :
biology computing; database indexing; genetics; query formulation; search engines; MRS index screen; genomic database; indexing scheme; prime factor filter; query sequence; search engine; Bioinformatics; Databases; Filters; Frequency; Genomics; Indexes; Indexing; Organisms; Search engines; Sequences; Genomic Databases; MRS Index; Prime Factor Index; sequence alignmnet;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Automation, 2006. ICIA 2006. International Conference on
Conference_Location :
Shandong
Print_ISBN :
1-4244-0555-6
Electronic_ISBN :
1-4244-0555-6
Type :
conf
DOI :
10.1109/ICINFA.2006.374137
Filename :
4250227
Link To Document :
بازگشت