Title :
Efficient Algorithms for Context Query Evaluation over a Tagged Corpus
Author :
Jérémy Barbay;Alex López-Ortiz
Author_Institution :
Dept. de Cienc. de la Comput. (DCC), Univ. de Chile, Santiago, Chile
Abstract :
We present an optimal adaptive algorithm for context queries in tagged content. The queries consist of locating instances of a tag within a context specified by the query using patterns with preorder, ancestor-descendant and proximity operators in the document tree implied by the tagged content. The time taken to resolve a query $Q$ on a document tree $T$ is logarithmic in the size of $T$, proportional to the size of $Q$, and to the difficulty of the combination of $Q$ with $T$, as measured by the minimal size of a certificate of the answer. The performance of the algorithm is no worse than the classical worst-case optimal, while provably better on simpler queries and corpora. More formally, the algorithm runs in time $\bigo(\difficulty\nbkeywords\lg(\nbobjects/\difficulty\nbkeywords))$ in the standard RAM model and in time $\bigo(\difficulty\nbkeywords\lg\lg\min(\nbobjects,\nblabels))$ in the $\Theta(\lg(\nbobjects))$-word RAM model, where $\nbkeywords$ is the number of edges in the query, $\difficulty$ is the minimum number of operations required to certify the answer to the query, $\nbobjects$ is the number of nodes in the tree, and $\nblabels$ is the number of labels indexed.
Keywords :
"Query processing","XML","Books","Database languages","Acquired immune deficiency syndrome","Africa","Computer science","Read-write memory","HTML","Adaptive algorithm"
Conference_Titel :
Chilean Computer Science Society (SCCC), 2009 International Conference of the
Print_ISBN :
978-1-4244-7752-4
DOI :
10.1109/SCCC.2009.16