Title :
Rewriting complex SPARQL analytical queries for efficient cloud-based processing
Author :
Padmashree Ravindra;HyeongSik Kim;Kemafor Anyanwu
Author_Institution :
Microsoft Corporation, Redmond, USA
Abstract :
Many emerging Semantic Web applications combine and aggregate data across domains for analysis. Such analytical queries compute aggregates over multiple groupings of data, resulting in query plans with complex grouping-aggregation constraints. In the context of an RDF analytical query, each such grouping maps to a graph pattern subquery with multiple join operations, and related groups often result in overlapping graph patterns within the same query. In this paper, we propose a holistic approach to optimize RDF analytical queries by refactoring queries to achieve shared execution of common subexpressions that enables parallel evaluation of groupings as well as aggregations. Such a rewriting enables shorter execution workflows, particularly beneficial for scale-out processing on distributed Cloud systems with multiple I/O phases. Experiments on real-world and synthetic benchmarks confirm that such a rewriting can achieve more efficient execution plans when compared to relational-style SPARQL query plans executed on popular Cloud systems.
Keywords :
"Resource description framework","Pattern matching","Optimized production technology","Electronic mail","Aggregates","Context"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363738