DocumentCode :
633070
Title :
Approximate Incremental Big-Data Harmonization
Author :
Agarwal, Prabhakar ; Shroff, Gautam ; Malhotra, Pankaj
Author_Institution :
TCS Innovation Labs., Tata Consultancy Services Ltd., Noida, India
fYear :
2013
fDate :
June 27 2013-July 2 2013
Firstpage :
118
Lastpage :
125
Abstract :
The needs of `big data analytics´ increasingly require IT organizations to ingest, process, and extract business insights from ever larger volumes of data that arrive far more rapidly than before, as well as from new sources such as social media, mobile devices, and sensors. However, in order to extract insights from diverse information feeds from multiple, often unrelated sources, these first need to be correlated or harmonized to a common level of granularity. We formally define this commonly arising data harmonization problem. We show how to correlate disparate data sources using map-reduce, but in an approximate and/or incremental manner as often required in practice. We motivate our techniques through a real-life enterprise data-harmonization case study for which we describe our performance results on big-data technologies, namely, Map Reduce, Hadoop and PIG.
Keywords :
business data processing; data analysis; Hadoop; IT organizations; Map-Reduce; PIG; approximate incremental big-data harmonization; big data analytics; big-data technologies; business insight extraction; business insight ingestion; business insight processing; enterprise data-harmonization; Bismuth; Business; Correlation; Current measurement; Data mining; Indexes; Approximate-Join; BigData; ETL-MR; Harmonization; Incremental ETL; Map-Reduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5006-0
Type :
conf
DOI :
10.1109/BigData.Congress.2013.24
Filename :
6597127
Link To Document :
بازگشت