Title :
Optimizing queries over semantically integrated datasets on MapReduce platforms
Author :
HyeongSik Kim ; Anyanwu, K.
Author_Institution :
Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
Abstract :
Life science databases generally consist of multiple heterogeneous datasets that have been integrated using complex ontologies. Querying such databases typically involves complex graph patterns, and evaluating such patterns poses challenges when MapReduce-based platforms are used to scale up processing, translating to long execution workflows with large amount of disk and network I/O costs. In this poster, we focus on optimizing UNION queries (e.g., unions of conjunctives for inference) and present an algebraic interpretation of the query rewritings which are more amenable to efficient processing on MapReduce.
Keywords :
algebra; data mining; ontologies (artificial intelligence); query processing; relational databases; MapReduce platform; UNION queries; algebraic interpretation; complex graph pattern; complex ontology; life science database; query rewriting; semantically integrated datasets; Algebra; Data models; Databases; Ontologies; Optimization; Pattern matching; Resource description framework; Life Science; MapReduce; SPARQL; Union;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691788