• DocumentCode
    659547
  • Title

    A performance evaluation of Hive for scientific data management

  • Author

    Taoying Liu ; Jing Liu ; Hong Liu ; Wei Li

  • Author_Institution
    Inst. of Comput. Technol., Beijing, China
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    39
  • Lastpage
    46
  • Abstract
    It is very important to evaluate the MapReduce-based frameworks for scientific data processing applications. Scientists need a low-cost, scalable, easy-to-use and fault-tolerance platform for large volume data processing eagerly. This paper presents an implementation of a scientific data management benchmark, SSDB, on Hive, a MapReduce-based data warehouse. A complete strategy of migrating SSDB to Hive is described in detail including query HQL implementation, data partition schema and adjustments of underlying storage facilities. We have tuned the performance using several system parameters provided by Hive, Hadoop and HDFS. This paper provides preliminary results and analysis. Evaluation results indicate that Hive achieves acceptable performance for some data analysis tasks even compared with some high efficient distributed parallel databases, but it needs subtle adjustments of underlying storage facilities and indexing mechanism.
  • Keywords
    data analysis; data warehouses; database indexing; parallel programming; query processing; scientific information systems; Hadoop; Hive performance evaluation; MapReduce-based data warehouse; MapReduce-based frameworks; SSDB; data analysis tasks; data partition scheme; indexing mechanism; large-volume data processing; low-cost-scalable-easy-to-use fault-tolerance platform; query HQL implementation; scientific data management benchmark; scientific data processing applications; storage facilities; system parameters; Arrays; Benchmark testing; Data processing; Data warehouses; Distributed databases; Indexing; Standards; Hive; benchmark; performance evaluation; scientific data management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691696
  • Filename
    6691696