• DocumentCode
    653965
  • Title

    Scientific Analysis by Queries in Extended SPARQL over a Scalable e-Science Data Store

  • Author

    Andrejev, Andrej ; Toor, Sukhjinder ; Hellander, Andreas ; Holmgren, Sverker ; Risch, Tore

  • Author_Institution
    Dept. of Inf. Technol., Uppsala Univ., Uppsala, Sweden
  • fYear
    2013
  • fDate
    22-25 Oct. 2013
  • Firstpage
    98
  • Lastpage
    106
  • Abstract
    Data-intensive applications in e-Science require scalable solutions for storage as well as interactive tools for analysis of scientific data. It is important to be able to query the data in a storage-independent way, and to be able to obtain the results of the data-analysis incrementally (in contrast to traditional batch solutions). We use the RDF data model extended with multidimensional numeric arrays to represent the results, parameters, and other metadata describing scientific experiments, and SciSPARQL, an extension of the SPARQL language, to combine massive numeric array data and metadata in queries. To address the scalability problem we present an architecture that enables the same SciSPARQL queries to be executed on the RDF dataset whether it is stored in a relational DBMS or mapped over a specialized geographically distributed e-Science data store. In order to minimize access and communication costs, we represent the arrays with proxy objects, and retrieve their content lazily. We formulate typical analysis tasks from a computational biology application in terms of SciSPARQL queries, and compare the query processing performance with manually written scripts in MATLAB.
  • Keywords
    data analysis; meta data; query processing; relational databases; scientific information systems; MATLAB; RDF data model; RDF dataset; SciSPARQL queries; access cost minimization; communication cost minimization; computational biology application; content retrieval; data-intensive applications; extended SPARQL language; geographically distributed e-Science data store; interactive tools; metadata; multidimensional numeric arrays; proxy objects; query processing performance; relational DBMS; scalable e-science data store; scientific data analysis; scientific experiments; Arrays; Computational modeling; Data models; Distributed databases; Mathematical model; Resource description framework; extended SPARQL; numeric arrays; scientific data store;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    eScience (eScience), 2013 IEEE 9th International Conference on
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/eScience.2013.19
  • Filename
    6683896