مرکز منطقه ای اطلاع رساني علوم و فناوري - Parallel Secondo: Boosting Database Engines with Hadoop

DocumentCode :

2980208

Title :

Parallel Secondo: Boosting Database Engines with Hadoop

Author :

Jiamin Lu ; Guting, Ralf Hartmut

Author_Institution :

Fac. of Math. & Comput. Sci., FernUniv. Hagen, Hagen, Germany

fYear :

2012

fDate :

17-19 Dec. 2012

Firstpage :

738

Lastpage :

743

Abstract :

Hadoop is an efficient and simple parallel framework following the Map Reduce paradigm, and making the parallel processing recently become a hot issue in data-intensive applications. Since Hadoop can be easily deployed on large-scale clusters including up to thousands of computers, various studies intend to process common relational database operations also on this new platform and expect to achieve a remarkable performance. However, these works have to prepare customized programs according to different input format, making the communication between co-workers difficult. Additionally, all intermediate data have to be transformed to key-value pairs and then transferred through the underlying HDFS, making the data processable by Map and Reduce tasks and keeping a balanced workload on the cluster. During this period, unnecessary overhead decreases both the speed-up and scale-up of these systems. Therefore, this paper attempts to propose a light and efficient coupling structure thus to combine Hadoop with single-computer databases on the engine level. On one hand, it uses a well-designed parallel data model to make end-users represent parallel queries like common queries. All current and future data types and algorithms can be used directly, having no need to be specifically changed for the parallel platform. On the other hand, it provides a simple and independent distributed file system to transfer data among database engines directly, without passing through HDFS, hence to remove as much as possible unnecessary transform and transfer overhead. For purpose of demonstration, a prototype Parallel Secondo is introduced in this paper. It has been fully evaluated in both small and large scale clusters, achieving satisfactory performances for different database operations.

Keywords :

data models; network operating systems; parallel databases; parallel programming; public domain software; query processing; relational databases; HDFS; Hadoop; MapReduce paradigm; Parallel Secondo; common queries; common relational database operations; customized programs; data transfer; data types; data-intensive applications; database engines; distributed file system; end-users; key-value pairs; large-scale clusters; parallel data model; parallel framework; parallel processing; parallel queries; single-computer databases; small-scale clusters; Computers; Distributed databases; Distribution functions; Engines; Servers; Trajectory; Hadoop; Hybrid system; moving objects database;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on

Conference_Location :

Singapore

ISSN :

1521-9097

Print_ISBN :

978-1-4673-4565-1

Electronic_ISBN :

1521-9097

Type :

conf

DOI :

10.1109/ICPADS.2012.119

Filename :

6413613

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2980208