Title :
A PaaS based metadata-driven ETL framework
Author :
Xu, Liutong ; Liao, Jia ; Zhao, Ruixue ; Wu, Bin
Author_Institution :
Beijing Key Lab. of Intell. Telecommun. Software & Multimedia, Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Knowledge discovery has often used as a background application to motivate many technical problems in ETL research. However, traditional ETL tools face new challenges include tremendous amount of data and limitation of computing ability and so on. Meanwhile, MapReduce parallel computing model has been widely used in recent years. In This paper, we first analyze the problems of existing ETL tools and propose a metadata-driven ETL service model, and then summarize the types of metadata and their application scopes. Based on this metadata-driven ETL service model, we put forward a concrete ETL framework combined ETL with MapReduce algorithm framework and provided as PaaS to meet the requirements. Afterwards, many significant services are also discussed. At last, we illustrate some strategies for advancing the flexibility, extensibility of the framework and promote the reusability of ETL components and ETL application. In conclusion, practices have proved that the model and the framework proposed in this paper have advantages that open-source or commercial ETL tools do not have and can deal the problem of processing large scale data.
Keywords :
meta data; parallel processing; MapReduce parallel computing model; PaaS based metadata driven ETL framework; knowledge discovery; Computational modeling; Data mining; Data models; Data processing; Data structures; Data warehouses; Testing; ETL framework; PaaS; knowledge discovery; metadata-driven;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-61284-203-5
DOI :
10.1109/CCIS.2011.6045113