P-ETL: Parallel-ETL based on the MapReduce paradigm

Author

Bala, Mahfoud ; Boussaid, Omar ; Alimazighi, Zaia

Author_Institution

LRDSI, Univ. of Blida 1, Blida, Algeria

fYear

2014

Firstpage

42

Lastpage

49

Abstract

Big data is an opportunity in the emergence of novel business applications such as “Big Data Analytics” (BDA). However, these data with non-traditional volumes create a real problem given the capacity constraints of traditional systems. The aim of this paper is to deal with the impact of big data in a decision-support environment and more particularly in the data integration phase. In this context, we developed a platform, called P-ETL (Parallel-ETL) for extracting (E), transforming (T) and loading (L) very large data in a data warehouse (DW). To cope with very large data, ETL processes under our P-ETL platform run on a cluster of computers in parallel way with MapReduce paradigm. The conducted experiment shows mainly that increasing tasks dealing with large data speeds-up the ETL process.

Keywords

Big Data; data handling; data warehouses; decision support systems; parallel programming; BDA; Big Data analytics; DW; MapReduce paradigm; P-ETL platform; business applications; capacity constraints; computer cluster; data integration phase; data warehouse; decision-support environment; extracting-transforming-and-loading platform; parallel-ETL platform; very large databases; Big data; Data mining; Loading; Merging; Pipelines; Round robin; Unified modeling language;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on

Type

conf

DOI

10.1109/AICCSA.2014.7073177

Filename

7073177