DocumentCode :
140747
Title :
R-Store: A scalable distributed system for supporting real-time analytics
Author :
Feng Li ; Ozsu, M. Tamer ; Gang Chen ; Beng Chin Ooi
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
fYear :
2014
fDate :
March 31 2014-April 4 2014
Firstpage :
40
Lastpage :
51
Abstract :
It is widely recognized that OLTP and OLAP queries have different data access patterns, processing needs and requirements. Hence, the OLTP queries and OLAP queries are typically handled by two different systems, and the data are periodically extracted from the OLTP system, transformed and loaded into the OLAP system for data analysis. With the awareness of the ability of big data in providing enterprises useful insights from vast amounts of data, effective and timely decisions derived from real-time analytics are important. It is therefore desirable to provide real-time OLAP querying support, where OLAP queries read the latest data while OLTP queries create the new versions. In this paper, we propose R-Store, a scalable distributed system for supporting real-time OLAP by extending the MapReduce framework. We extend an open source distributed key/value system, HBase, as the underlying storage system that stores data cube and real-time data. When real-time data are updated, they are streamed to a streaming MapReduce, namely Hstreaming, for updating the cube on incremental basis. Based on the metadata stored in the storage system, either the data cube or OLTP database or both are used by the MapReduce jobs for OLAP queries. We propose techniques to efficiently scan the real-time data in the storage system, and design an adaptive algorithm to process the real-time query based on our proposed cost model. The main objectives are to ensure the freshness of answers and low processing latency. The experiments conducted on the TPC-H data set demonstrate the effectiveness and efficiency of our approach.
Keywords :
Big Data; data analysis; data mining; distributed processing; meta data; public domain software; query processing; storage management; HBase; Hstreaming; MapReduce framework; OLAP queries; OLTP database; OLTP queries; R-Store; TPC-H data set; adaptive algorithm; big data; data access patterns; data analysis; data cube storage; metadata; open source distributed key-value system; real-time analytics; real-time data storage; scalable distributed system; storage system; Compaction; Computer architecture; Data models; Distributed databases; Educational institutions; Maintenance engineering; Real-time systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
Type :
conf
DOI :
10.1109/ICDE.2014.6816638
Filename :
6816638
Link To Document :
بازگشت