DocumentCode :
2508039
Title :
Fast alignment of large genome databases: a demonstration
Author :
Kahveci, Tamer ; Singh, Ambuj K.
Author_Institution :
Dept. of Comput. Sci., California Univ., Santa Barbara, CA, USA
fYear :
2003
fDate :
5-8 March 2003
Firstpage :
768
Lastpage :
770
Abstract :
We demonstrate an efficient algorithm for alignment of large genome strings. Our algorithm constructs a Boolean match table for a given query string and database string with the help of the MRS index structure. The size of the MRS index structure is approximately 1-2% of that of database. Each entry of the match table corresponds to a query/database substring pair. An entry in the match table is marked as True if the corresponding query substring and database substring potentially contain similar patterns. It is marked as False otherwise. The size of the match table is negligible compared to that of database. Once the match table is computed, we build hash tables on these strings. Once the hash table of a string is constructed the marked substrings of other string are read sequentially and exactly matching substrings of the prespecified size are found using this hash table. We call this technique MAP (match table based pruning). Experimental results show that MAP runs up to 97 times faster than BLAST.
Keywords :
database management systems; genetics; query processing; string matching; Boolean match table; MRS index structure; genome database; hash table; match table based pruning; query string; string matching; Bioinformatics; Biological information theory; Databases; Genetics; Genomics; Humans; Indexes; Medical treatment; Organisms; Phylogeny;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2003. Proceedings. 19th International Conference on
Print_ISBN :
0-7803-7665-X
Type :
conf
DOI :
10.1109/ICDE.2003.1260862
Filename :
1260862
Link To Document :
بازگشت