Title :
MapReduce model-based optimization of range queries
Author :
Zhao, Hui ; Yang, Shuqiang ; Chen, Zhikun ; Jin, Songcang ; Yin, Hong ; Li, Long
Author_Institution :
Dept. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
In recent years, MapReduce parallel computing model has gained lots of attentions from industry and academia. In Google, Yahoo, Facebook, etc., it has played a very good effect, which greatly simplifies the design of large-scale data-intensive applications. MapReduce-based systems were originally used to manage massive unstructured and semi-structured data, for example: to generate the inverted index, to calculate web page rank, log analysis, etc. Therefore, current MapReduce systems don´t give more considerations to the optimization of structured data, for example: it uses the brute-force scanning mode to process the whole datasets, which confront the common workflow of structured data processing, range query and analysis. To address the problem, this paper propose to build a global B-tree like index on top of hadoop distributed file system for structured data, and use the global index to eliminate unnecessary map tasks during range queries, thereby reducing the overhead of data I/O and tasks scheduling, which not only reduces query response time, but also greatly optimizes system resource utilization.
Keywords :
Web sites; data structures; optimisation; parallel processing; query processing; scheduling; trees (mathematics); Facebook; Google; MapReduce model-based optimization; MapReduce parallel computing model; Web page rank; Yahoo; brute-force scanning mode; global B-tree; hadoop distributed file system; log analysis; query response time; range queries; semi-structured data; structured data processing; system resource utilization; tasks scheduling; unstructured data; Computational modeling; Distributed databases; Indexes; Optimization; Parallel processing; Time factors; MapReduce; big data; cloud computing; distributed computing; global index; hadoop; parallel computing; range query;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
Conference_Location :
Sichuan
Print_ISBN :
978-1-4673-0025-4
DOI :
10.1109/FSKD.2012.6234050