Title :
A data locality optimization algorithm for large-scale data processing in Hadoop
Author :
Zhao, Yanrong ; Wang, Weiping ; Meng, Dan ; Yang, Xiufeng ; Zhang, Shubin ; Li, Jun ; Guan, Gang
Author_Institution :
Inst. of Comput. Technol., Grad. Univ., Beijing, China
Abstract :
Data-intensive applications are increasingly designed to execute on large computing clusters. Our previous observation on Tencent production systems has indicated that join query is one of the most important queries in large-scale data processing. When running a join query on Hive system, the job of the join query is divided into map phase and reduce phase, and requires transferring large amounts of intermediate results over the network, which is inefficient. In this paper, we proposed an algorithm called CHMJ, the general idea of the algorithm is to take advantage of data locality to accelerate calculation. It includes four parts, Data distribution strategy, Parallel HashMapJoin Algorithm, CoLocation Scheduling and Delay scheduling strategy. CHMJ has been adopted in Tencent data warehouse, and plays an important role in Tencent´s daily operations. Our relevant experiments demonstrate the feasibility and efficiency of our solution.
Keywords :
data handling; data warehouses; parallel processing; portals; query processing; scheduling; CHMJ algorithm; Hadoop; Hive system; Internet service portal; Tencent daily operation; Tencent data warehouse; Tencent production system; colocation scheduling; computing cluster; data distribution strategy; data locality optimization algorithm; data-intensive application; delay scheduling strategy; join query; large-scale data processing; map phase; parallel hashmapjoin algorithm; reduce phase; Algorithm design and analysis; Clustering algorithms; Data processing; Delay; Partitioning algorithms; Query processing; Scheduling; Hadoop; MapReduce; join query;
Conference_Titel :
Computers and Communications (ISCC), 2012 IEEE Symposium on
Conference_Location :
Cappadocia
Print_ISBN :
978-1-4673-2712-1
Electronic_ISBN :
1530-1346
DOI :
10.1109/ISCC.2012.6249372