DocumentCode :
2403583
Title :
Exploiting local similarity for indexing paths in graph-structured data
Author :
Kaushik, Raghav ; Shenoy, Pradeep ; Bohannon, Philip ; Gudes, Ehud
fYear :
2002
fDate :
2002
Firstpage :
129
Lastpage :
140
Abstract :
XML and other semi-structured data may have partially specified or missing schema information, motivating the use of a structural summary which can be automatically computed from the data. These summaries also serve as indices for evaluating the complex path expressions common to XML and semi-structured query languages. However, to answer all path queries accurately, summaries must encode information about long, seldom-queried paths, leading to increased size and complexity with little added value. We introduce the A(k)-indices, a family of approximate structural summaries. They are based on the concept of k-bisimilarity, in which nodes are grouped based on local structure, i.e., the incoming paths of length up to k. The parameter k thus smoothly varies the level of detail (and accuracy) of the A(k)-index. For small values of k, the size of the index is substantially reduced. While smaller, the A(k) index is approximate, and we describe techniques for efficiently extracting exact answers to regular path queries. Our experiments show that, for moderate values of k, path evaluation using the A(k)-index ranges from being very efficient for simple queries to competitive for most complex queries, while using significantly less space than comparable structures
Keywords :
data structures; database indexing; directed graphs; hypermedia markup languages; query languages; query processing; XML; complex path expressions; data structures; directed graph; experiments; k-bisimilarity; missing schema information; query languages; regular path queries; semi-structured data; structural summary; Data engineering; Data mining; Data models; Data structures; Database languages; Indexing; Query processing; Statistics; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2002. Proceedings. 18th International Conference on
Conference_Location :
San Jose, CA
ISSN :
1063-6382
Print_ISBN :
0-7695-1531-2
Type :
conf
DOI :
10.1109/ICDE.2002.994703
Filename :
994703
Link To Document :
بازگشت