• DocumentCode
    2396751
  • Title

    YSmart: Yet Another SQL-to-MapReduce Translator

  • Author

    Lee, Rubao ; Luo, Tian ; Huai, Yin ; Wang, Fusheng ; He, Yongqiang ; Zhang, Xiaodong

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2011
  • fDate
    20-24 June 2011
  • Firstpage
    25
  • Lastpage
    36
  • Abstract
    MapReduce has become an effective approach to big data analytics in large cluster systems, where SQL-like queries play important roles to interface between users and systems. However, based on our Facebook daily operation results, certain types of queries are executed at an unacceptable low speed by Hive (a production SQL-to-MapReduce translator). In this paper, we demonstrate that existing SQL-to-MapReduce translators that operate in a one-operation-to-one-job mode and do not consider query correlations cannot generate high-performance MapReduce programs for certain queries, due to the mismatch between complex SQL structures and simple MapReduce framework. We propose and develop a system called Y Smart, a correlation aware SQL-to-MapReduce translator. Y Smart applies a set of rules to use the minimal number of MapReduce jobs to execute multiple correlated operations in a complex query. Y Smart can significantly reduce redundant computations, I/O operations and network transfers compared to existing translators. We have implemented Y Smart with intensive evaluation for complex queries on two Amazon EC2 clusters and one Facebook production cluster. The results show that Y Smart can outperform Hive and Pig, two widely used SQL-to-MapReduce translators, by more than four times for query execution.
  • Keywords
    SQL; program interpreters; query processing; workstation clusters; Amazon EC2 cluster; Facebook; Hive; SQL-like query; YSmart; correlation aware SQL-to-MapReduce translator; query execution; Correlation; Data analysis; Decision support systems; Facebook; Optimization; Production; Programming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems (ICDCS), 2011 31st International Conference on
  • Conference_Location
    Minneapolis, MN
  • ISSN
    1063-6927
  • Print_ISBN
    978-1-61284-384-1
  • Electronic_ISBN
    1063-6927
  • Type

    conf

  • DOI
    10.1109/ICDCS.2011.26
  • Filename
    5961685