Title :
Lotus: A framework for query optimization based on distributed cache
Author :
Chaoyong Li; Gong Cheng; Jinwen Zhong; Can Ma; Weiping Wang; Dan Meng; Qing Wang; Bo Wang
Author_Institution :
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093
Abstract :
With the size of data increasing continuously, a huge demand for interactive query on massive datasets emerges. When processing massive structured data, the existing query engines are lacking in utilizing the query locality and catching the difference among query operators, which results in their not being applied to low-latency business scenarios. To solve these problems, this paper proposes a new framework named Lotus for query optimization based on distributed cache. Lotus adopts three strategies: (1) performing query-sensitive data distribution policy; (2) carrying out cache replacement based on statistical information; (3) optimizing the behavior of core operators. Through the above methods, Lotus improves the query performance of existing engines. The experimental study shows that Lotus can reduce the response latency and execution time of queries on large-scale structured data by more than 30% in comparison with SparkSQL or Impala.
Keywords :
"Query processing","Engines","Business","Distributed databases","Optimization","Benchmark testing","Data communication"
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on
DOI :
10.1109/FSKD.2015.7382111