DocumentCode :
653965
Title :
Scientific Analysis by Queries in Extended SPARQL over a Scalable e-Science Data Store
Author :
Andrejev, Andrej ; Toor, Sukhjinder ; Hellander, Andreas ; Holmgren, Sverker ; Risch, Tore
Author_Institution :
Dept. of Inf. Technol., Uppsala Univ., Uppsala, Sweden
fYear :
2013
fDate :
22-25 Oct. 2013
Firstpage :
98
Lastpage :
106
Abstract :
Data-intensive applications in e-Science require scalable solutions for storage as well as interactive tools for analysis of scientific data. It is important to be able to query the data in a storage-independent way, and to be able to obtain the results of the data-analysis incrementally (in contrast to traditional batch solutions). We use the RDF data model extended with multidimensional numeric arrays to represent the results, parameters, and other metadata describing scientific experiments, and SciSPARQL, an extension of the SPARQL language, to combine massive numeric array data and metadata in queries. To address the scalability problem we present an architecture that enables the same SciSPARQL queries to be executed on the RDF dataset whether it is stored in a relational DBMS or mapped over a specialized geographically distributed e-Science data store. In order to minimize access and communication costs, we represent the arrays with proxy objects, and retrieve their content lazily. We formulate typical analysis tasks from a computational biology application in terms of SciSPARQL queries, and compare the query processing performance with manually written scripts in MATLAB.
Keywords :
data analysis; meta data; query processing; relational databases; scientific information systems; MATLAB; RDF data model; RDF dataset; SciSPARQL queries; access cost minimization; communication cost minimization; computational biology application; content retrieval; data-intensive applications; extended SPARQL language; geographically distributed e-Science data store; interactive tools; metadata; multidimensional numeric arrays; proxy objects; query processing performance; relational DBMS; scalable e-science data store; scientific data analysis; scientific experiments; Arrays; Computational modeling; Data models; Distributed databases; Mathematical model; Resource description framework; extended SPARQL; numeric arrays; scientific data store;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
eScience (eScience), 2013 IEEE 9th International Conference on
Conference_Location :
Beijing
Type :
conf
DOI :
10.1109/eScience.2013.19
Filename :
6683896
Link To Document :
بازگشت