Title :
EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud
Author :
Xiaofei Zhang ; Lei Chen ; Yongxin Tong ; Min Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., HKUST, Kowloon, China
Abstract :
To benefit from the Cloud platform´s unlimited resources, managing and evaluating huge volume of RDF data in a scalable manner has attracted intensive research efforts recently. Progresses have been made on evaluating SPARQL queries with either high-level declarative programming languages, like Pig [1], or a sequence of sophisticated designed MapReduce jobs, both of which tend to answer the query with multiple join operations. However, due to the simplicity of Cloud storage and the coarse organization of RDF data in existing solutions, multiple join operations easily bring significant I/O and network traffic which can severely degrade the system performance. In this work, we first propose EAGRE, an Entity-Aware Graph compREssion technique to form a new representation of RDF data on Cloud platforms, based on which we propose an I/O efficient strategy to evaluate SPARQL queries as quickly as possible, especially queries with specified solution sequence modifiers, e.g., PROJECTION, ORDER BY, etc. We implement a prototype system and conduct extensive experiments over both real and synthetic datasets on an in-house cluster. The experimental results show that our solution can achieve over an order of magnitude of time saving for the SPARQL query evaluation compared to the state-of-art MapReduce-based solutions.
Keywords :
cloud computing; data compression; data handling; query languages; query processing; EAGRE; Entity-Aware Graph compREssion technique; MapReduce job; MapReduce-based solution; ORDER BY; PROJECTION; Pig; RDF data representation; cloud computing; cloud platform; cloud storage; high-level declarative programming language; in-house cluster; multiple join operation; network traffic; query answering; scalable I/O efficient SPARQL query evaluation; scalable RDF data management; solution sequence modifier; system performance degradation; Data models; Layout; Nickel; Processor scheduling; Query processing; Resource description framework; Scheduling;
Conference_Titel :
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-4909-3
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2013.6544856