DocumentCode :
2432506
Title :
Data engineering approach to efficient data warehouse: Life cycle development revisited
Author :
Daneshpour, Negin ; Barfourosh, Ahmad Abdollahzadeh
Author_Institution :
Dept. of Comput. Eng. & Inf. Technol., Amirkabir Univ. of Technol., Tehran, Iran
fYear :
2011
fDate :
15-16 June 2011
Firstpage :
109
Lastpage :
120
Abstract :
Data warehouse (DW) refers to technologies for collecting, integrating, analyzing large volume of homogeneous/heterogeneous data to provide information to enable better decision making. To achieve the main purpose of data warehouse to present analytical response to online queries it is necessary to consider many parameters in development life cycle. Among all factors involved in DW efficiency the quality of data should be taken more seriously. Today data warehouse architecture typically consists of several components which consolidate data from several operational and historical databases to support a variety of front-end query reporting and analytical tools. The back-end of the architecture is mainly relying on Extract-Transform-Load (ETL) process which we usually prefer to have it as a tool. The design and implementation application dependent ETL to pipeline validated and verified data is a labor intensive and typically consumes a large fraction of effort in data warehouse projects. Outcome of our experiment to build DW based on recommended methodology on thirty three million actual population records confirms that the life cycle of DW development has to be revisited. Many works have been reported regarding to data quality impact on efficiency of DW, but less attentions have been made to recognize data engineering aspects to revise the development life cycle for having efficient DW. Our investigation through last experiment shows 3 following steps facilitate life cycle process, and resulted DW is more tailored. 1) Data cleaning as a pre-process phase before data cleansing on ETL. 2) Identifying query type and their operation before transforming phase on ETL. 3) Identifying and materializing suited view for each query before load phase on ETL. The result regarding, to accuracy, effort and time has been tested and is significantly promising.
Keywords :
data warehouses; query processing; data cleaning; data engineering approach; data quality impact; data warehouse architecture; extract-transform-load process; front-end query reporting; life cycle development; Business; Classification algorithms; Cleaning; Data warehouses; Databases; Heuristic algorithms; Time factors; OLAP; data cleaning; data engineering; data warehouse development life cycle; query type classification; view materialization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Software Engineering (CSSE), 2011 CSI International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-61284-206-6
Type :
conf
DOI :
10.1109/CSICSSE.2011.5963983
Filename :
5963983
Link To Document :
بازگشت