• DocumentCode
    125573
  • Title

    Towards Load Balancing and Parallelizing of RDF Query Processing in P2P Based Distributed RDF Data Stores

  • Author

    Ali, L. ; Janson, Thomas ; Schindelhauer, Christian

  • Author_Institution
    Univ. of Freiburg, Freiburg, Germany
  • fYear
    2014
  • fDate
    12-14 Feb. 2014
  • Firstpage
    307
  • Lastpage
    311
  • Abstract
    For evaluating RDF queries in Peer-to-Peer (P2P) based RDF data stores, the location of a RDF triple in the network must be attainable from a triple pattern in the given query. An existing strategy, used by state-of-the-art distributed RDF data stores, to fulfill this requirement is to store triples at three locations that each triple can be found by the subject, predicate, and object identifier. A major drawback of this strategy is the issue of load-balancing caused by the fact that the frequency of subject, predicate, and object occurrences in triples is not uniformly distributed. While the majority of URIs and literals occur very rarely some occur very frequently (e.g., peer responsible for ´rdf:type´ is subjected to a very high storage load). In addition, this skewed RDF triples distribution among network peers also leads to an unfair query processing load distribution and long query processing time. To cope with hotspots caused by unfair data load distribution, we propose an optimized routing index scheme where triples are indexed on the combination of their subject, predicate and object components. This paper will also show how can we exploit this novel index scheme to achieve a better distribution of query processing load and faster query response time by bundling computation resources and bandwidth of peers with parallelism.
  • Keywords
    distributed databases; peer-to-peer computing; query processing; resource allocation; P2P based distributed RDF data stores; RDF query processing parallelization; RDF triple; computation resources; load balancing; long query processing time; network peers; object identifier; object occurrence frequency; peer-to-peer based RDF data stores; peers bandwidth; predicate frequency; resource description framework; skewed RDF triples distribution; subject frequency; unfair data load distribution; unfair query processing load distribution; Bandwidth; Distributed databases; Indexing; Peer-to-peer computing; Query processing; Resource description framework;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on
  • Conference_Location
    Torino
  • ISSN
    1066-6192
  • Type

    conf

  • DOI
    10.1109/PDP.2014.79
  • Filename
    6787291