DocumentCode :
2731547
Title :
Propagating Updates in SPIDER
Author :
Koudas, N. ; Marathe, A. ; Srivastava, Divesh
Author_Institution :
Toronto Univ., Ont., Canada
fYear :
2007
fDate :
15-20 April 2007
Firstpage :
1146
Lastpage :
1153
Abstract :
SPIDER, developed at AT&T Labs-Research, is a system that efficiently supports flexible string matching against attribute values in large databases, and is extensively used in AT&T. The scoring methodology is based on tf.idf weighting and cosine similarity, and SPIDER maintains indexes containing string tokens and their weights, for fast matching at query time. Given the "global" nature of the weights maintained in the indexes, even a few updates to the underlying database tables would necessitate a (near-complete recomputation of the indexes, which can be prohibitively expensive. In this paper, we explore novel techniques to considerably reduce the cost of propagating updates in SPIDER, without a significant degradation of answer accuracy or query performance. We present experimental evidence using real data sets to demonstrate the practical benefits of our techniques.
Keywords :
indexing; query processing; string matching; very large databases; SPIDER; answer accuracy; database tables; indexing; large databases; query performance; string matching; string tokens; Costs; Customer relationship management; Databases; Degradation; Delay; Density estimation robust algorithm; Indexes; Information processing; Pressing; Prototypes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Conference_Location :
Istanbul
Print_ISBN :
1-4244-0802-4
Type :
conf
DOI :
10.1109/ICDE.2007.368973
Filename :
4221763
Link To Document :
بازگشت