• DocumentCode
    174904
  • Title

    Distributed Evaluation of XPath Axes Queries over Large XML Documents Stored in MapReduce Clusters

  • Author

    Enk, A. ; Valenta, M. ; Benn, W.

  • Author_Institution
    Czech Tech. Univ. FIT, Prague, Czech Republic
  • fYear
    2014
  • fDate
    1-5 Sept. 2014
  • Firstpage
    253
  • Lastpage
    257
  • Abstract
    The MR (MapReduce) framework, a programming model for parallel computation over data stored in a cluster of commodity computers, established itself as one of the leading solutions for Big Data processing. This framework is also being used like a query language in many database systems, because it can process data stored in various unstructured, semi-structured, and structured formats. Nevertheless, the MR framework can be used for XML data processing too, it does not allow to write queries in a declarative manner, like XPath or XQuery. To overcome this problem, we propose a system that enables to query XML data with XPath, but it evaluates the queries in parallel using the MR framework. First, we introduce a persistent storage that maps XML data into a wide-column store. The proposed mapping enables efficient and distributed data processing. Secondly, we describe a query processor translating an XPath language subset to MR jobs. Finally, we present tests and their results showing the scalability of our system.
  • Keywords
    Big Data; XML; distributed processing; document handling; query processing; MR framework; MapReduce clusters; XML documents; XPath; XQuery; big data processing; commodity computer cluster; database systems; declarative manner; distributed XPath axes queries evaluation; parallel computation; query processor; wide-column store; Computers; Data models; Indexes; Query processing; Scalability; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
  • Conference_Location
    Munich
  • ISSN
    1529-4188
  • Print_ISBN
    978-1-4799-5721-7
  • Type

    conf

  • DOI
    10.1109/DEXA.2014.59
  • Filename
    6974858