DocumentCode :
2386294
Title :
Integrating DBMSs as a Read-Only Execution Layer into Hadoop
Author :
An, Mingyuan ; Wang, Yang ; Wang, Weiping ; Sun, Ninghui
Author_Institution :
Key Lab. of Comput. Syst. & Archit., Grad. Univ. of Chinese Acad. of Sci., Beijing, China
fYear :
2010
fDate :
8-11 Dec. 2010
Firstpage :
17
Lastpage :
26
Abstract :
To obtain the efficiency of DBMS, HadoopDB combines Hadoop and DBMS, and claims the superiority over Hadoop in terms of performance. However, the approach of HadoopDB is simply putting Map Reduce onto unmodified single-machined DBMSs which has several obvious weaknesses. In essence, HadoopDB is a parallel DBMS with fault tolerance, which incurs unnecessary overhead due to the DBMS legacy. Instead of augmenting DBMS with Hadoop techniques, we propose a new system architecture integrating modified DBMS engines as a read-only execution layer into Hadoop, where DBMS plays a role of providing efficient read-only operators rather than managing the data. Besides the obtained efficiency from DBMS engine, there are other advantages. The modified DBMS engine is able to directly process data from the HDFS (Hadoop Distributed File System) files at the block level, which means that the data replication can be handled by HDFS naturally, and the block-level parallelism is easily achieved. The global index access mechanism is added according to the Map Reduce paradigm. The data loading speed is also guaranteed by directly writing the data into HDFS with simplified logic. Experiments show that our system outperforms both original Hadoop and HadoopDB styled system.
Keywords :
data handling; fault tolerant computing; information retrieval; parallel databases; software architecture; DBMS engine; HDFS; Hadoop distributed file system; HadoopDB; MapReduce; block-level parallelism; data processing; database management system; fault tolerance; index access; parallel DBMS; read-only execution layer; single-machined DBMS; system architecture; Engines; Fault tolerance; Fault tolerant systems; Indexes; Loading; Parallel processing; Hadoop; database; global index access; large-scale data processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-9110-0
Electronic_ISBN :
978-0-7695-4287-4
Type :
conf
DOI :
10.1109/PDCAT.2010.43
Filename :
5704399
Link To Document :
بازگشت