DocumentCode
174904
Title
Distributed Evaluation of XPath Axes Queries over Large XML Documents Stored in MapReduce Clusters
Author
Enk, A. ; Valenta, M. ; Benn, W.
Author_Institution
Czech Tech. Univ. FIT, Prague, Czech Republic
fYear
2014
fDate
1-5 Sept. 2014
Firstpage
253
Lastpage
257
Abstract
The MR (MapReduce) framework, a programming model for parallel computation over data stored in a cluster of commodity computers, established itself as one of the leading solutions for Big Data processing. This framework is also being used like a query language in many database systems, because it can process data stored in various unstructured, semi-structured, and structured formats. Nevertheless, the MR framework can be used for XML data processing too, it does not allow to write queries in a declarative manner, like XPath or XQuery. To overcome this problem, we propose a system that enables to query XML data with XPath, but it evaluates the queries in parallel using the MR framework. First, we introduce a persistent storage that maps XML data into a wide-column store. The proposed mapping enables efficient and distributed data processing. Secondly, we describe a query processor translating an XPath language subset to MR jobs. Finally, we present tests and their results showing the scalability of our system.
Keywords
Big Data; XML; distributed processing; document handling; query processing; MR framework; MapReduce clusters; XML documents; XPath; XQuery; big data processing; commodity computer cluster; database systems; declarative manner; distributed XPath axes queries evaluation; parallel computation; query processor; wide-column store; Computers; Data models; Indexes; Query processing; Scalability; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
Conference_Location
Munich
ISSN
1529-4188
Print_ISBN
978-1-4799-5721-7
Type
conf
DOI
10.1109/DEXA.2014.59
Filename
6974858
Link To Document