Title :
A performance evaluation of Hive for scientific data management
Author :
Taoying Liu ; Jing Liu ; Hong Liu ; Wei Li
Author_Institution :
Inst. of Comput. Technol., Beijing, China
Abstract :
It is very important to evaluate the MapReduce-based frameworks for scientific data processing applications. Scientists need a low-cost, scalable, easy-to-use and fault-tolerance platform for large volume data processing eagerly. This paper presents an implementation of a scientific data management benchmark, SSDB, on Hive, a MapReduce-based data warehouse. A complete strategy of migrating SSDB to Hive is described in detail including query HQL implementation, data partition schema and adjustments of underlying storage facilities. We have tuned the performance using several system parameters provided by Hive, Hadoop and HDFS. This paper provides preliminary results and analysis. Evaluation results indicate that Hive achieves acceptable performance for some data analysis tasks even compared with some high efficient distributed parallel databases, but it needs subtle adjustments of underlying storage facilities and indexing mechanism.
Keywords :
data analysis; data warehouses; database indexing; parallel programming; query processing; scientific information systems; Hadoop; Hive performance evaluation; MapReduce-based data warehouse; MapReduce-based frameworks; SSDB; data analysis tasks; data partition scheme; indexing mechanism; large-volume data processing; low-cost-scalable-easy-to-use fault-tolerance platform; query HQL implementation; scientific data management benchmark; scientific data processing applications; storage facilities; system parameters; Arrays; Benchmark testing; Data processing; Data warehouses; Distributed databases; Indexing; Standards; Hive; benchmark; performance evaluation; scientific data management;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691696