DocumentCode
659547
Title
A performance evaluation of Hive for scientific data management
Author
Taoying Liu ; Jing Liu ; Hong Liu ; Wei Li
Author_Institution
Inst. of Comput. Technol., Beijing, China
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
39
Lastpage
46
Abstract
It is very important to evaluate the MapReduce-based frameworks for scientific data processing applications. Scientists need a low-cost, scalable, easy-to-use and fault-tolerance platform for large volume data processing eagerly. This paper presents an implementation of a scientific data management benchmark, SSDB, on Hive, a MapReduce-based data warehouse. A complete strategy of migrating SSDB to Hive is described in detail including query HQL implementation, data partition schema and adjustments of underlying storage facilities. We have tuned the performance using several system parameters provided by Hive, Hadoop and HDFS. This paper provides preliminary results and analysis. Evaluation results indicate that Hive achieves acceptable performance for some data analysis tasks even compared with some high efficient distributed parallel databases, but it needs subtle adjustments of underlying storage facilities and indexing mechanism.
Keywords
data analysis; data warehouses; database indexing; parallel programming; query processing; scientific information systems; Hadoop; Hive performance evaluation; MapReduce-based data warehouse; MapReduce-based frameworks; SSDB; data analysis tasks; data partition scheme; indexing mechanism; large-volume data processing; low-cost-scalable-easy-to-use fault-tolerance platform; query HQL implementation; scientific data management benchmark; scientific data processing applications; storage facilities; system parameters; Arrays; Benchmark testing; Data processing; Data warehouses; Distributed databases; Indexing; Standards; Hive; benchmark; performance evaluation; scientific data management;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691696
Filename
6691696
Link To Document