A fault-tolerant environment for large-scale query processing

Author

Kurt, Mehmet Can ; Agrawal, Gagan

Author_Institution

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear

2012

fDate

18-22 Dec. 2012

Firstpage

1

Lastpage

10

Abstract

As datasets are increasing in size, the data management and processing needs are being met with added parallelism, i.e, by involving more nodes and/or cores in the system. This, in turn, is increasing the chances of failures during processing. In this paper, we present the design and implementation of a fault-tolerant environment for processing queries on large scientific dataset. Our systems meet the following three requirements that we consider essential for any such environment: 1) high efficiency of execution of a particular data analysis task or query, when there are no failures, 2) ability to handle failure of up to a certain number of nodes, and 3) only a modest slowdown in processing times of data analysis task or a query when there are failures. We address these challenges by developing a new data replication scheme, which we refer to as subchunk or subpartition replication. Our system currently supports two types of queries: range queries on spatial data and aggregation queries on point datasets, but the underlying ideas can be extended to other query types as well. Our extensive evaluation shows that we can handle single and rack failures with only modest slowdowns, and particularly, clearly outperform the traditional (chunk or partition replication) schemes.

Keywords

data analysis; query processing; data analysis; data management; data processing; data replication scheme; fault-tolerant environment; large scale query processing; spatial aggregation queries; spatial data; subchunk replication; subpartition replication;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computing (HiPC), 2012 19th International Conference on

Conference_Location

Pune

Print_ISBN

978-1-4673-2372-7

Electronic_ISBN

978-1-4673-2370-3

Type

conf

DOI

10.1109/HiPC.2012.6507487

Filename

6507487