Title :
Big Data Pre-processing: A Quality Framework
Author :
Taleb, Ikbal ; Dssouli, Rachida ; Serhani, Mohamed Adel
Author_Institution :
CIISE, Concordia Univ., Montreal, QC, Canada
Abstract :
With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
Keywords :
Big Data; data analysis; EEG dataset; QBD; big data preprocessing; big data processing lifecycle; cleansing process; data analysis; data provenance repository; data quality profile selection; data quality selection module; data transformation; filtering process; heterogeneous data; integration process; normalization process; quality of big data; Accuracy; Big data; Business; Data analysis; Data integration; Distributed databases; Big Data; Data Quality; pre-processing;
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
DOI :
10.1109/BigDataCongress.2015.35