مرکز منطقه ای اطلاع رساني علوم و فناوري - Fast similarity search in string databases

DocumentCode :

433465

Title :

Fast similarity search in string databases

Author :

Sheu, Simon ; Chang, Alan ; Huang, Webber

Author_Institution :

Comput. Sci., Nat. Tsing Hua Univ., Hsinchu, Taiwan

Volume :

fYear :

2005

fDate :

28-30 March 2005

Firstpage :

617

Abstract :

Efficient similarity search in large string databases requires effective index support. Since long strings have each numerous substrings of arbitrary length, the effective index designs are of great challenge. The existing solution, namely MRS [T. Kahveci et al., (2001)], employs a low-cost lower bound function to sieve out the most similar candidates from the majority of unlikely database substrings. Therefore, only very small portions of string databases require the expensive true edit distance computation to finalize the query. A significant savings in overall query processing cost can be realized by the filtration feature of lower bound functions. In this paper, we seek to improve MRS to its full potential. Specifically, we propose a very simple method that exchanges the roles of database strings and query string in the original MRS design. Despite simplicity, our solution can further improve the query performance by 10 times in terms of disk page accesses while using only half of the original index´s size.

Keywords :

query formulation; string matching; very large databases; MRS design; disk page access; low-cost lower bound function; query processing; string databases; Bioinformatics; Computer science; Cost function; Genetics; Genomics; Indexes; Information retrieval; Query processing; Research and development management; Spatial databases;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advanced Information Networking and Applications, 2005. AINA 2005. 19th International Conference on

ISSN :

1550-445X

Print_ISBN :

0-7695-2249-1

Type :

conf

DOI :

10.1109/AINA.2005.185

Filename :

1423558

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=433465