Title :
Fast alignment of large genome databases: a demonstration
Author :
Kahveci, Tamer ; Singh, Ambuj K.
Author_Institution :
Dept. of Comput. Sci., California Univ., Santa Barbara, CA, USA
Abstract :
We demonstrate an efficient algorithm for alignment of large genome strings. Our algorithm constructs a Boolean match table for a given query string and database string with the help of the MRS index structure. The size of the MRS index structure is approximately 1-2% of that of database. Each entry of the match table corresponds to a query/database substring pair. An entry in the match table is marked as True if the corresponding query substring and database substring potentially contain similar patterns. It is marked as False otherwise. The size of the match table is negligible compared to that of database. Once the match table is computed, we build hash tables on these strings. Once the hash table of a string is constructed the marked substrings of other string are read sequentially and exactly matching substrings of the prespecified size are found using this hash table. We call this technique MAP (match table based pruning). Experimental results show that MAP runs up to 97 times faster than BLAST.
Keywords :
database management systems; genetics; query processing; string matching; Boolean match table; MRS index structure; genome database; hash table; match table based pruning; query string; string matching; Bioinformatics; Biological information theory; Databases; Genetics; Genomics; Humans; Indexes; Medical treatment; Organisms; Phylogeny;
Conference_Titel :
Data Engineering, 2003. Proceedings. 19th International Conference on
Print_ISBN :
0-7803-7665-X
DOI :
10.1109/ICDE.2003.1260862