DocumentCode :
2980208
Title :
Parallel Secondo: Boosting Database Engines with Hadoop
Author :
Jiamin Lu ; Guting, Ralf Hartmut
Author_Institution :
Fac. of Math. & Comput. Sci., FernUniv. Hagen, Hagen, Germany
fYear :
2012
fDate :
17-19 Dec. 2012
Firstpage :
738
Lastpage :
743
Abstract :
Hadoop is an efficient and simple parallel framework following the Map Reduce paradigm, and making the parallel processing recently become a hot issue in data-intensive applications. Since Hadoop can be easily deployed on large-scale clusters including up to thousands of computers, various studies intend to process common relational database operations also on this new platform and expect to achieve a remarkable performance. However, these works have to prepare customized programs according to different input format, making the communication between co-workers difficult. Additionally, all intermediate data have to be transformed to key-value pairs and then transferred through the underlying HDFS, making the data processable by Map and Reduce tasks and keeping a balanced workload on the cluster. During this period, unnecessary overhead decreases both the speed-up and scale-up of these systems. Therefore, this paper attempts to propose a light and efficient coupling structure thus to combine Hadoop with single-computer databases on the engine level. On one hand, it uses a well-designed parallel data model to make end-users represent parallel queries like common queries. All current and future data types and algorithms can be used directly, having no need to be specifically changed for the parallel platform. On the other hand, it provides a simple and independent distributed file system to transfer data among database engines directly, without passing through HDFS, hence to remove as much as possible unnecessary transform and transfer overhead. For purpose of demonstration, a prototype Parallel Secondo is introduced in this paper. It has been fully evaluated in both small and large scale clusters, achieving satisfactory performances for different database operations.
Keywords :
data models; network operating systems; parallel databases; parallel programming; public domain software; query processing; relational databases; HDFS; Hadoop; MapReduce paradigm; Parallel Secondo; common queries; common relational database operations; customized programs; data transfer; data types; data-intensive applications; database engines; distributed file system; end-users; key-value pairs; large-scale clusters; parallel data model; parallel framework; parallel processing; parallel queries; single-computer databases; small-scale clusters; Computers; Distributed databases; Distribution functions; Engines; Servers; Trajectory; Hadoop; Hybrid system; moving objects database;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on
Conference_Location :
Singapore
ISSN :
1521-9097
Print_ISBN :
978-1-4673-4565-1
Electronic_ISBN :
1521-9097
Type :
conf
DOI :
10.1109/ICPADS.2012.119
Filename :
6413613
Link To Document :
بازگشت