مرکز منطقه ای اطلاع رساني علوم و فناوري - Fast alignment of large genome databases: a demonstration

DocumentCode :

2508039

Title :

Fast alignment of large genome databases: a demonstration

Author :

Kahveci, Tamer ; Singh, Ambuj K.

Author_Institution :

Dept. of Comput. Sci., California Univ., Santa Barbara, CA, USA

fYear :

2003

fDate :

5-8 March 2003

Firstpage :

768

Lastpage :

770

Abstract :

We demonstrate an efficient algorithm for alignment of large genome strings. Our algorithm constructs a Boolean match table for a given query string and database string with the help of the MRS index structure. The size of the MRS index structure is approximately 1-2% of that of database. Each entry of the match table corresponds to a query/database substring pair. An entry in the match table is marked as True if the corresponding query substring and database substring potentially contain similar patterns. It is marked as False otherwise. The size of the match table is negligible compared to that of database. Once the match table is computed, we build hash tables on these strings. Once the hash table of a string is constructed the marked substrings of other string are read sequentially and exactly matching substrings of the prespecified size are found using this hash table. We call this technique MAP (match table based pruning). Experimental results show that MAP runs up to 97 times faster than BLAST.

Keywords :

database management systems; genetics; query processing; string matching; Boolean match table; MRS index structure; genome database; hash table; match table based pruning; query string; string matching; Bioinformatics; Biological information theory; Databases; Genetics; Genomics; Humans; Indexes; Medical treatment; Organisms; Phylogeny;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering, 2003. Proceedings. 19th International Conference on

Print_ISBN :

0-7803-7665-X

Type :

conf

DOI :

10.1109/ICDE.2003.1260862

Filename :

1260862

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2508039