• DocumentCode
    1824424
  • Title

    Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools

  • Author

    Husain, Mohammad Farhan ; Khan, Latifur ; Kantarcioglu, Murat ; Thuraisingham, Bhavani

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
  • fYear
    2010
  • fDate
    5-10 July 2010
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    Cloud computing is the newest paradigm in the IT world and hence the focus of new research. Companies hosting cloud computing services face the challenge of handling data intensive applications. Semantic web technologies can be an ideal candidate to be used together with cloud computing tools to provide a solution. These technologies have been standardized by the World Wide Web Consortium (W3C). One such standard is the Resource Description Framework (RDF). With the explosion of semantic web technologies, large RDF graphs are common place. Current frameworks do not scale for large RDF graphs. In this paper, we describe a framework that we built using Hadoop, a popular open source framework for Cloud Computing, to store and retrieve large numbers of RDF triples. We describe a scheme to store RDF data in Hadoop Distributed File System. We present an algorithm to generate the best possible query plan to answer a SPARQL Protocol and RDF Query Language (SPARQL) query based on a cost model. We use Hadoop´s MapReduce framework to answer the queries. Our results show that we can store large RDF graphs in Hadoop clusters built with cheap commodity class hardware. Furthermore, we show that our framework is scalable and efficient and can easily handle billions of RDF triples, unlike traditional approaches.
  • Keywords
    distributed processing; query languages; query processing; semantic Web; RDF graphs; SPARQL protocol; World Wide Web consortium; cloud computing tools; data intensive query processing; hadoop distributed file system; resource description framework; semantic Web technologies; Cloud computing; Data mining; Data models; Distributed databases; Ontologies; Resource description framework; Cloud; Hadoop; RDF; Semantic Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4244-8207-8
  • Electronic_ISBN
    978-0-7695-4130-3
  • Type

    conf

  • DOI
    10.1109/CLOUD.2010.36
  • Filename
    5558142