DocumentCode :
3717135
Title :
A scalable parallel XQuery processor
Author :
E. Preston Carman;Till Westmann;Vinayak R. Borkar;Michael J. Carey;Vassilis J. Tsotras
Author_Institution :
University of California, Riverside
fYear :
2015
Firstpage :
164
Lastpage :
173
Abstract :
The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data and take advantage of parallelism, we have implemented Apache VXQuery, an open-source scalable XQuery processor. The system builds upon two other open-source frameworks: Hyracks, a parallel execution engine, and Algebricks, a language agnostic compiler toolbox. Apache VXQuery extends these frameworks and provides an implementation of the XQuery specifics (data model, data-model dependent functions and optimizations, and a parser). We describe the architecture of Apache VXQuery, its integration with Hyracks and Algebricks, and the XQuery optimization rules applied to the query plan to improve path expression efficiency and to enable query parallelism. An experimental evaluation using a real 500GB dataset with various selection, aggregation and join XML queries shows that Apache VXQuery performs well both in terms of scale-up and speed-up. Our experiments show that it is about 3.5x faster than Saxon (an open-source and commercial XQuery processor) on a 4-core, single node implementation, and around 2.5x faster than Apache MRQL (a MapReduce-based parallel query processor) on an eight (4-core) node cluster.
Keywords :
"XML","Data models","Algebra","Optimization","Aggregates","Parallel processing","Open source software"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363753
Filename :
7363753
Link To Document :
بازگشت