DocumentCode :
3582700
Title :
P-ETL: Parallel-ETL based on the MapReduce paradigm
Author :
Bala, Mahfoud ; Boussaid, Omar ; Alimazighi, Zaia
Author_Institution :
LRDSI, Univ. of Blida 1, Blida, Algeria
fYear :
2014
Firstpage :
42
Lastpage :
49
Abstract :
Big data is an opportunity in the emergence of novel business applications such as “Big Data Analytics” (BDA). However, these data with non-traditional volumes create a real problem given the capacity constraints of traditional systems. The aim of this paper is to deal with the impact of big data in a decision-support environment and more particularly in the data integration phase. In this context, we developed a platform, called P-ETL (Parallel-ETL) for extracting (E), transforming (T) and loading (L) very large data in a data warehouse (DW). To cope with very large data, ETL processes under our P-ETL platform run on a cluster of computers in parallel way with MapReduce paradigm. The conducted experiment shows mainly that increasing tasks dealing with large data speeds-up the ETL process.
Keywords :
Big Data; data handling; data warehouses; decision support systems; parallel programming; BDA; Big Data analytics; DW; MapReduce paradigm; P-ETL platform; business applications; capacity constraints; computer cluster; data integration phase; data warehouse; decision-support environment; extracting-transforming-and-loading platform; parallel-ETL platform; very large databases; Big data; Data mining; Loading; Merging; Pipelines; Round robin; Unified modeling language;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on
Type :
conf
DOI :
10.1109/AICCSA.2014.7073177
Filename :
7073177
Link To Document :
بازگشت