مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2731547

Title :

Propagating Updates in SPIDER

Author :

Koudas, N. ; Marathe, A. ; Srivastava, Divesh

Author_Institution :

Toronto Univ., Ont., Canada

fYear :

2007

fDate :

15-20 April 2007

Firstpage :

1146

Lastpage :

1153

Abstract :

SPIDER, developed at AT&T Labs-Research, is a system that efficiently supports flexible string matching against attribute values in large databases, and is extensively used in AT&T. The scoring methodology is based on tf.idf weighting and cosine similarity, and SPIDER maintains indexes containing string tokens and their weights, for fast matching at query time. Given the "global" nature of the weights maintained in the indexes, even a few updates to the underlying database tables would necessitate a (near-complete recomputation of the indexes, which can be prohibitively expensive. In this paper, we explore novel techniques to considerably reduce the cost of propagating updates in SPIDER, without a significant degradation of answer accuracy or query performance. We present experimental evidence using real data sets to demonstrate the practical benefits of our techniques.

Keywords :

indexing; query processing; string matching; very large databases; SPIDER; answer accuracy; database tables; indexing; large databases; query performance; string matching; string tokens; Costs; Customer relationship management; Databases; Degradation; Delay; Density estimation robust algorithm; Indexes; Information processing; Pressing; Prototypes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on

Conference_Location :

Istanbul

Print_ISBN :

1-4244-0802-4

Type :

conf

DOI :

10.1109/ICDE.2007.368973

Filename :

4221763

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2731547