DocumentCode :
1924689
Title :
Mastiff: A MapReduce-based System for Time-Based Big Data Analytics
Author :
Guo, Sijie ; Xiong, Jin ; Wang, Weiping ; Lee, Rubao
Author_Institution :
State Key Lab. of Comput. Archit., Inst. of Comput. Technol., Beijing, China
fYear :
2012
fDate :
24-28 Sept. 2012
Firstpage :
72
Lastpage :
80
Abstract :
Existing MapReduce-based warehousing systems are not specially optimized for time-based big data analysis applications. Such applications have two characteristics: 1) data are continuously generated and are required to be stored persistently for a long period of time, 2) applications usually process data in some time period so that typical queries use time-related predicates. Time-based big data analytics requires both high data loading speed and high query execution performance. However, existing systems including current MapReduce-based solutions do not solve this problem well because the two requirements are contradictory. We have implemented a MapReduce-based system, called Mastiff, which provides a solution to achieve both high data loading speed and high query performance. Mastiff exploits a systematic combination of a column group store structure and a lightweight helper structure. Furthermore, Mastiff uses an optimized table scan method and a column-based query execution engine to boost query performance. Based on extensive experiments results with diverse workloads, we will show that Mastiff can significantly outperform existing systems including Hive, HadoopDB, and GridSQL.
Keywords :
data analysis; data warehouses; distributed processing; query processing; storage management; GridSQL; HadoopDB; Hive; MapReduce-based warehousing systems; Mastiff; column group store structure; column-based query execution engine; data continuous generation; data processing; data storage; helper structure; high data loading speed; high query execution performance; optimized table scan method; query performance; time-based big data analytics; time-related predicate query; Data handling; Data storage systems; Engines; Indexes; Information management; Loading; Servers; MapReduce; time-based data analytics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2422-9
Type :
conf
DOI :
10.1109/CLUSTER.2012.10
Filename :
6337767
Link To Document :
بازگشت