• DocumentCode
    4693
  • Title

    FastRAQ: A Fast Approach to Range-Aggregate Queries in Big Data Environments

  • Author

    Xiaochun Yun ; Guangjun Wu ; Guangyan Zhang ; Keqin Li ; Shupeng Wang

  • Author_Institution
    Inst. of Inf. Eng., Beijing, China
  • Volume
    3
  • Issue
    2
  • fYear
    2015
  • fDate
    April-June 1 2015
  • Firstpage
    206
  • Lastpage
    218
  • Abstract
    Range-aggregate queries are to apply a certain aggregate function on all tuples within given query ranges. Existing approaches to range-aggregate queries are insufficient to quickly provide accurate results in big data environments. In this paper, we propose FastRAQ-a fast approach to range-aggregate queries in big data environments. FastRAQ first divides big data into independent partitions with a balanced partitioning algorithm, and then generates a local estimation sketch for each partition. When a range-aggregate query request arrives, FastRAQ obtains the result directly by summarizing local estimates from all partitions. FastRAQ has O(1) time complexity for data updates and O(N/P×B) time complexity for range-aggregate queries, where N is the number of distinct tuples for all dimensions, P is the partition number, and B is the bucket number in the histogram. We implement the FastRAQ approach on the Linux platform, and evaluate its performance with about 10 billions data records. Experimental results demonstrate that FastRAQ provides range-aggregate query results within a time period two orders of magnitude lower than that of Hive, while the relative error is less than 3 percent within the given confidence interval.
  • Keywords
    Big Data; Linux; computational complexity; query processing; FastRAQ; Linux platform; balanced partitioning algorithm; big data environments; range-aggregate queries; time complexity; Aggregates; Big data; Histograms; Indexes; Partitioning algorithms; Servers; Vectors; Balanced partition; big data; multidimensional histogram; range-aggregate query;
  • fLanguage
    English
  • Journal_Title
    Cloud Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-7161
  • Type

    jour

  • DOI
    10.1109/TCC.2014.2338325
  • Filename
    6868272