A Hybrid Join Algorithm on Top of Map Reduce

Author

Hu, Weisong ; Ma, Lili ; Liu, Xiaowei ; Qi, Hongwei ; Zha, Li ; Liao, Huaming ; Zhang, Yuezhou

fYear

2011

fDate

24-26 Oct. 2011

Firstpage

Lastpage

Abstract

Hadoop has shown great power in processing vast data in parallel. Hive, the database on Hadoop, enables more experts to process relational data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in relational database. Due to the importance of join, this paper proposes a novel hybrid algorithm, HJA, which can help to automatically choose the relatively better one among several methods, divide and memory copy merge, Partition Join(PJ) and naïve Hive join. Experiments show that HJA can get best performance in most situations.

Keywords

SQL; parallel processing; relational databases; HJA; Hadoop; MapReduce; Partition Join; SQL-like interface; naive Hive join; relational database; Semantics; Hadoop; MapReduce; auto-tuning; partition join;

fLanguage

English

Publisher

ieee

Conference_Titel

Semantics Knowledge and Grid (SKG), 2011 Seventh International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4577-1323-1

Type

conf

DOI

10.1109/SKG.2011.13

Filename

6088090

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2427938