• DocumentCode
    685909
  • Title

    A Bloom Filter-Based Approach for Efficient Mapreduce Query Processing on Ordered Datasets

  • Author

    Zhijian Chen ; Dan Wu ; Wenyan Xie ; Jiazhi Zeng ; Jian He ; Di Wu

  • Author_Institution
    Network & Inf. Branch, Guangdong Electr. Power Design Inst., Guangzhou, China
  • fYear
    2013
  • fDate
    13-15 Dec. 2013
  • Firstpage
    93
  • Lastpage
    98
  • Abstract
    The MapReduce processing framework is unaware of the property of underlying datasets. For ordered datasets (e.g., time-series data), in which records have been already sorted, MapReduce still performs unnecessary sorting operations during its execution. It directly results in a significant increase of execution time, as sorting a large volume of data is time-consuming. In this paper, we propose a bloom filter-based approach to improve the performance of MapReduce when processing ordered datasets. In our approach, all records are stored in a set of bloom filters after the Mapping phase and data queries can be efficiently processed by checking the bloom filters. Due to the high querying efficiency of bloom filters, we can achieve significant performance gain in the Reducing phase. We conduct a series of experiments to evaluate the effectiveness of our proposed bloom filter-based approach. Our experimental results show that our approach can achieve 2x speedup in terms of query processing performance, and reduce the CPU/memory utilization in the meanwhile. Moreover, we also evaluate the scalability of our proposed approach when processing multiple queries, and observe that the speedup can be further improved with the increasing number of queries.
  • Keywords
    Big Data; data analysis; data structures; distributed programming; query processing; Big Data analytics; Bloom filter-based approach; CPU-memory utilization; MapReduce query processing; data queries; large data volume sorting; mapping phase; ordered datasets; Complexity theory; Data processing; Hardware; Query processing; Resource management; Scalability; Sorting; Bloom Filter-based Approach; Performance; Schedule; Sorting-removed Approach;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Cloud and Big Data (CBD), 2013 International Conference on
  • Conference_Location
    Nanjing
  • Print_ISBN
    978-1-4799-3260-3
  • Type

    conf

  • DOI
    10.1109/CBD.2013.1
  • Filename
    6824579